WebVeta - Advanced, unified, consistent search for your website(s), from content of your website(s), blogs(s). First 50 customers, who sign-up prior to 15/05/2024 get unlimited access to existing features, newer features for at least 1 year. Sign up now! https://webveta.alightservices.com/
Categories
.Net C# Solr

Some SolrNET C# programming tips

SolrNet – is a .Net based library for interacting with Solr using C#.

Solr is a full-text engine server built on top of Apache Lucene. Apache Lucene is a full-text engine.

SolrNet is a C# library for easily generating the REST calls for interacting with Solr server.

One of the most important class is the QueryOptions class. The QueryOptions class allows to specify several options and probably some options need own blog posts.

For paging the results, the following options can be used:

var pageNumber = 2;

var options = new QueryOptions()
            {
                Rows = 10,               
                StartOrCursor = new StartOrCursor.Start((pageNumber - 1) * 10)
            };

The above code shows getting 10 results, starting from the 11th. The pageNumbers variable was 2, so (pageNumber – 1) * 10 would mean 10. The default 0 i.e from the beginning.

Another useful option is specifying the Fields to retrieve. Think of this like specifying the columns to retrieve in SQL statement instead of all i.e SELECT col1, col2 instead of SELECT *.

var options = new QueryOptions()
{
    Fields = new[] { "col1", "col2" }
};

I am hoping this blog post helps someone.

BTW, LMAO! Funny seeing little scumbags of planet earth using some powerful spying equipment and they trying to pass commands. The scumbags/pests/leeches and sl*ts with the equipment have false prestige and false propaganda.

I don’t have any fake aliases, nor any virtual aliases like some of the the psycho spy R&AW traitors of India. NOT associated – “ass”, eass, female “es”, “eka”, “ok”, “okay”, “is”, “erra”, yerra, karan, kamalakar, diwakar, kareem, karan, sowmya, zinnabathuni, bojja srinivas (was a friend and batchmate 1998 – 2002), mukesh golla (was a friend and classmate 1998 – 2002), thota veera, uttam’s, bandhavi’s, bhattaru’s, thota’s, bojja’s, bhattaru’s or Arumilli srinivas or Arumilli uttam (may be they are part of a different Arumilli family – not my family).

Mr. Kanti Kalyan Arumilli

Arumilli Kanti Kalyan, Founder & CEO
Arumilli Kanti Kalyan, Founder & CEO

B.Tech, M.B.A

Facebook

LinkedIn

Threads

Instagram

Youtube

Founder & CEO, Lead Full-Stack .Net developer

ALight Technology And Services Limited

ALight Technologies USA Inc

Youtube

Facebook

LinkedIn

Phone / SMS / WhatsApp on the following 3 numbers:

+91-789-362-6688, +1-480-347-6849, +44-07718-273-964

+44-33-3303-1284 (Preferred number if calling from U.K, No WhatsApp)

kantikalyan@gmail.com, kantikalyan@outlook.com, admin@alightservices.com, kantikalyan.arumilli@alightservices.com, KArumilli2020@student.hult.edu, KantiKArumilli@outlook.com and 3 more rarely used email addresses – hardly once or twice a year.

Categories
.Net C# Solr

Some performance tips when ingesting documents into Solr

As mentioned in several blog posts earlier, I have been building PodDB on Microsoft.Net platform and Solr. Solr is built on top of Apache Lucene.

Lucene.Net is a very high-performance library for working directly with Apache Lucene, SolrNet is a library for working with Solr. Solr is very customizable, fault-tolerant and has several additional features available out of the box and is built on top of Lucene. Working with SolrNet can be a bit slow because all the API calls are routed via a REST API. The usual overhead of establishing network connection, serializing and deserializing JSON or XML.

Over the past few days, I have been working on a small subset of documents (approximately 275 – 300, the same would be part of the Alpha release) and trying to tweak the settings for optimal search relevance. This required trying various Solr configurations, re-indexing data etc… The very first version of the data ingestion component (does much more pre-processing rather than just ingesting into solr) used to take about approximately 10 minutes. And now the performance has been optimized and the ingestion happens within 15 seconds. i.e over 4000% performance gain and entirely programming related.

The trick used was one of the oldest tricks in the book – batch processing. Instead of one document at a time for writing into a MySQL database and writing into Solr, I rewrote the application to ingest in batches and the application was much faster.

Batching with multi-threading might be even faster.

In other words instead of calling solr.Add() for each document, create the documents, hold them in a list, call solr.AddRange().

Similarly for solr.Commit() and solr.Optimize() batch the calls i.e call those methods once for every 1000 or so documents rather than every document.

Assuming doc is a Solr document that needs to be written. For example:

//NO
solr.Add(doc1);
solr.Add(doc2);
solr.Add(doc3);

//YES
var lst = new List<ENTITY>();
lst.Add(doc1);
lst.Add(doc2);
lst.Add(doc3);

solr.AddRange(lst);

I like to share knowledge, I am hoping this blog post helps someone.

My 2 cents to the world of the blogosphere!

I don’t have any fake aliases, nor any virtual aliases like some of the the psycho spy R&AW traitors of India. NOT associated – “ass”, eass, female “es”, “eka”, “ok”, “okay”, “is”, “erra”, yerra, karan, kamalakar, diwakar, kareem, karan, sowmya, zinnabathuni, bojja srinivas (was a friend and batchmate 1998 – 2002), mukesh golla (was a friend and classmate 1998 – 2002), thota veera, uttam’s, bandhavi’s, bhattaru’s, thota’s, bojja’s, bhattaru’s or Arumilli srinivas or Arumilli uttam (may be they are part of a different Arumilli family – not my family).

Mr. Kanti Kalyan Arumilli

Arumilli Kanti Kalyan, Founder & CEO
Arumilli Kanti Kalyan, Founder & CEO

B.Tech, M.B.A

Facebook

LinkedIn

Threads

Instagram

Youtube

Founder & CEO, Lead Full-Stack .Net developer

ALight Technology And Services Limited

ALight Technologies USA Inc

Youtube

Facebook

LinkedIn

Phone / SMS / WhatsApp on the following 3 numbers:

+91-789-362-6688, +1-480-347-6849, +44-07718-273-964

+44-33-3303-1284 (Preferred number if calling from U.K, No WhatsApp)

kantikalyan@gmail.com, kantikalyan@outlook.com, admin@alightservices.com, kantikalyan.arumilli@alightservices.com, KArumilli2020@student.hult.edu, KantiKArumilli@outlook.com and 3 more rarely used email addresses – hardly once or twice a year.

Categories
Solr

Schemas and Configs in Solr

Solr has very extensive documentation and highly configurable. But for the purposes of getting started and keeping things simple, I am going to mention some important parts.

Solr can be operated with schema or schemaless. I personally would prefer proper schema. Solr allows storing values, skipping storage and just indexing or just storing without indexing. Similarly required or not, multi-valued, unique key etc…

Continuing from the previous blog posts of Getting started:

Getting started with Solr on Windows

Getting started with Solr on Linux!

First create a core by navigating to the solr directory and executing:

> solr.cmd create -c <NAME_OF_CORE>
// Example for creating a core known as "sample"
> solr.cmd create -c sample

Now navigate to <solr directory>\server\solr. Here you will find the sample directory. Inside sample directory, there would be a conf directory. The conf directory has the schema.xml and solrconfig.xml.

Now let’s delve into some Schema.xml:

The most important part is defining the schema, the copyField’s.

Let’s assume our schema requires a unique id, title, description and tags.

id is going to be a unique field, we want title, description and tags to be indexed. Same object can have multiple tags, so tags would be multi-valued.

We would add the following:

<field name="Title" type="text_general" indexed="true" stored="true"/>
<field name="Description" type="text_general" indexed="true" stored="true"/>
<field name="Tags" type="text_general" multiValued="true" indexed="true" stored="true"/>



<field name="id" type="string" multiValued="false" indexed="true" required="true" stored="true"/>

The “id” field would exist, we don’t want to add another “id” field, verify the syntax matches. Now we want to copy the values from Title, Description, Tags into _text_ and want _text_ to be indexed. Some people would try to copy * into _text_ as in all the fields. But in some schemas there may be meta information such as dates etc… So specifying which particular fields need to be copied would help. The next code block shows the copyfields.

<copyField source="Title" dest="_text_" />
<copyField source="Description" dest="_text_" />
<copyField source="Tags" dest="_text_" />

Now let’s delve into solrconfig.xml:

Search for <requestHandler name=”/select” class=”solr.SearchHandler”>

Inside this tag we can specify default page size, default field to index, any default facets etc. Let’s assume we want to set default page size of 10, default field to search of _text_ and faceting on Tags.

<requestHandler name="/select" class="solr.SearchHandler">
    <lst name="defaults">
      <str name="echoParams">explicit</str>
      <int name="rows">10</int>
      <str name="df">_text_</str>

      <str name="facet">on</str>
      <str name="facet.field">Tags</str>
    </lst>
  </requestHandler>

If config or schema changes, the core needs to be reloaded.

More Solr related Articles:

Schemas and Configs in Solr

Getting started with Solr on Windows

Getting started with Solr on Linux!

Hoping this blog post helps someone.

Categories
Solr

Getting started with Solr on Linux!

As promised in a previous blog post Getting started with Solr on Windows, posted few minutes ago, here is the Getting started on Linux.

First install Java. This blog post is based on Ubuntu 20.04 server.

> sudo apt update
> sudo apt upgrade
> sudo apt install default-jdk -y
> sudo wget https://dlcdn.apache.org/solr/solr/9.0.0/solr-9.0.0.tgz
> sudo tar xzf solr-9.0.0.tgz
> sudo bash solr-9.0.0/bin/install_solr_service.sh solr-9.0.0.tgz

Then navigate to http://localhost:8983/solr/ you should see a webpage that looks similar to:

Solr Admin Ubuntu

If you need access from a different computer, remember to allow port 8983 with the following command:

> sudo ufw allow 8983

Now we have seen Getting started with Solr on Windows and Getting started with Solr on Linux. In a future blog post, we will see how to create cores, define schema, add documents for indexing using C#, how to search through the documents using C#. This blog post would be updated with the links to the relevant blog articles when those blog posts are published.

More Solr related Articles:

Schemas and Configs in Solr

Getting started with Solr on Windows

Getting started with Solr on Linux!

Here are some excellent getting started documentation from Solr:

https://solr.apache.org/guide/solr/latest/getting-started/solr-tutorial.html

For production grade deployment:

https://solr.apache.org/guide/solr/latest/deployment-guide/solr-control-script-reference.html

https://solr.apache.org/guide/solr/latest/configuration-guide/configuration-files.html

Categories
Solr

Getting started with Solr on Windows

This blog post is purely about getting started with Solr. A programming related blog post would be posted later. Getting started on Linux would also be posted later.

Download the latest version of Solr from https://solr.apache.org/downloads.html. As of this blog post the latest release was 9.0.0 and I have downloaded the Binary release from https://www.apache.org/dyn/closer.lua/solr/solr/9.0.0/solr-9.0.0.tgz?action=download.

Once downloaded extract the files. For this blog post, I have extracted the files on Windows under C:\solr.

Solr requies Java. Download and install Java from https://www.oracle.com/java/technologies/downloads/. Set the environment variable JAVA_HOME to the JRE directory.

JAVA_HOME environment variable on Windows.

Now navigate to C:\Solr in command prompt and enter the following:

bin\solr.cmd start

This should start Solr, if not, verify your Java installation and environment variable.

Now navigate to http://localhost:8983/ in your web browser. You should see a web page that looks something like this:

Solr Admin Web Page

You might not see the Core Selector dropdown but the web page should look similar.

Getting started with Solr on Linux.

In a future blog post, we will see how to create cores, define schema, add documents for indexing using C#, how to search through the documents using C#. This blog post would be updated with the links to the relevant blog articles when those blog posts are published.

More Solr related Articles:

Schemas and Configs in Solr

Getting started with Solr on Windows

Getting started with Solr on Linux!

Here are some excellent getting started documentation from Solr:

https://solr.apache.org/guide/solr/latest/getting-started/solr-tutorial.html

For production grade deployment:

https://solr.apache.org/guide/solr/latest/deployment-guide/solr-control-script-reference.html

https://solr.apache.org/guide/solr/latest/configuration-guide/configuration-files.html

Categories
.Net C# Lucene Solr

Lucene vs Solr

I played around with Lucene.Net and Solr. Solr is built on top of Lucene.

Lucene.Net is a port of Lucene library written in C# for working with Lucene on Microsoft .Net stack.

Lucene is a library built by Apache Software Foundation. Lucene provides full-text search capabilities. There are few other alternatives such as Sphinx, full-text search capabilities built into RDBMS’s such as Microsoft SQL Server, MySQL, MariaDB, PostgreSQL etc… However, full-text search capabilities in RDBMS’s are not as efficient as Lucene.

Solr and ElasticSearch are built on top of Lucene. ElasticSearch is more suitable and efficient for time-series data.

Now let’s see more about Solr vs Lucene.

Solr provides some additional features such as replication, web app GUI, collecting and publishing metrics, fault-tolerant etc… Solr provides HTTP REST-based API’s for management and for adding documents, searching documents etc…

Directly working with Lucene would provide access to more fine-grained control.

Because Solr provides REST based API’s there is the overhead of establishing HTTP connection, formatting the requests, JSON serialization, and deserialization at both ends i.e client making the call and the Solr server. By directly working with Lucene this overhead does not exist.

If searching through the documents happens on the same server, working directly with Lucene might be efficient. Specifically in lesser data scenarios, but if huge datasets and scaling are a concern, Solr might be the proper approach.

If server infrastructure requirements require separate search servers and a bunch of application servers query the search servers for data, Solr might be more useful and easier because of existing support replication and HTTP API’s.

If performance is of the highest importance and still fine-grained control is needed, custom-built applications should expose the data from search servers and some other more efficient protocols such as gRPC could be used and obviously, replication mechanisms need to be custom-built.