Customers who sign-up prior to 30/06/2024 get unlimited access to free features, newer features (with some restrictions), but for free for at least 1 year.Sign up now! https://webveta.alightservices.com/
Categories
.Net C# Lucene Solr

Lucene vs Solr

I played around with Lucene.Net and Solr. Solr is built on top of Lucene.

Lucene.Net is a port of Lucene library written in C# for working with Lucene on Microsoft .Net stack.

Lucene is a library built by Apache Software Foundation. Lucene provides full-text search capabilities. There are few other alternatives such as Sphinx, full-text search capabilities built into RDBMS’s such as Microsoft SQL Server, MySQL, MariaDB, PostgreSQL etc… However, full-text search capabilities in RDBMS’s are not as efficient as Lucene.

Solr and ElasticSearch are built on top of Lucene. ElasticSearch is more suitable and efficient for time-series data.

Now let’s see more about Solr vs Lucene.

Solr provides some additional features such as replication, web app GUI, collecting and publishing metrics, fault-tolerant etc… Solr provides HTTP REST-based API’s for management and for adding documents, searching documents etc…

Directly working with Lucene would provide access to more fine-grained control.

Because Solr provides REST based API’s there is the overhead of establishing HTTP connection, formatting the requests, JSON serialization, and deserialization at both ends i.e client making the call and the Solr server. By directly working with Lucene this overhead does not exist.

If searching through the documents happens on the same server, working directly with Lucene might be efficient. Specifically in lesser data scenarios, but if huge datasets and scaling are a concern, Solr might be the proper approach.

If server infrastructure requirements require separate search servers and a bunch of application servers query the search servers for data, Solr might be more useful and easier because of existing support replication and HTTP API’s.

If performance is of the highest importance and still fine-grained control is needed, custom-built applications should expose the data from search servers and some other more efficient protocols such as gRPC could be used and obviously, replication mechanisms need to be custom-built.