Add free search for your website. Sign up now! https://webveta.alightservices.com/

KickStarter campaign, offering steep discounts. Sign-up and support if possible. Thank you!

Categories
.Net C# Lucene MultiFieldQueryParser

Lucene with C#

As part of development effort for PodDB – A search engine for podcasts – A product of ALight Technology And Services Limited, I have been deciding between Lucene.Net and Solr. I strongly suggest Solr over Lucene.Net if you want to scale. For smaller datasets, Lucene.Net shouldn’t be a problem. But, if you want to scale for larger datasets, want built-in sharding, replication features out of the box choose Solr. For smaller datasets and if you know, you wouldn’t be scaling into bigger datasets, Lucene.Net shouldn’t be a problem and as a matter of fact very efficient. With that said, I do have plans of scaling PodDB, if PodDB gains traction, so I chose Solr.

But for the sake of knowledge sharing, in this article, I am going to show how to use Lucene.Net for full-text indexing. I would not go over complex scenarios and at the same time, this article is NOT a Hello World for Lucene.Net.

Moreover, Lucene.Net does not seem to be under active development. As of this blog post date – September 13tth 2022, there are no GitHub commits over the past 2 – 3 months. As software developers, technical leads, and architects we have the responsibility in making the proper choices for the underlying technology stack. Although, ALight Technology And Services Limited is not an enterprise yet, still, I would like to make decisions suitable over the long time.

Now let’s dig into some code.

Lucene.Net version 4 is in pre-release. Use the pre-release versions of Lucene.

Because Lucene.Net is in beta and there could be lot’s of breaking changes, the compatibility version needs to be declared in code.

private const LuceneVersion AppLuceneVersion = LuceneVersion.LUCENE_48;

Now we specify the directory where we want the indexes to be written and some initialization code.

var dir = FSDirectory.Open(indexDirectory);
var analyzer = new StandardAnalyzer(AppLuceneVersion);
var indexConfig = new IndexWriterConfig(AppLuceneVersion, analyzer);
var writer = new IndexWriter(dir, indexConfig);

Now, we use the IndexWriter for writing documents. There are primarily 2 types of string fields that are important.

  1. TextField – The string data is indexed for full-text
  2. StringField – The data is not indexed for full-data but can be searched like normal strings for fields such as id etc…

Based on the above-mentioned types, determine the data that needs full-text search capabilities and the data that would not need full-text search capabilities and if certain data needs to be stored in Lucene.

var doc = new Document
 {
    new TextField("Title", "Some Data", Field.Store.YES),
    new TextField("Description", "Description", Field.Store.YES),
    new StringField("Id", id, Field.Store.YES)
};

You can add as many TextField and StringField instances as needed. You can even create seperate instances of TextField and StringField and call doc.Add().

If you want to optimize the search results provided by Lucene, you can even specify the Boost of the TextField. By default the Boost i.e weight given to any field in 1.0. But can specify a higher weighting for a certain field. For example, if a keyword is in title you might want to boost the entity.

Add the doc instance to writer and flush();

writer.AddDocument(doc);
writer.Flush(triggerMerge: false, applyAllDeletes: false);

For speed and efficiency batch the documents before calling Flush, instead of calling Flush for every document.

Assuming you have built your indexes. Now let’s start to retrieve.

using var dir = FSDirectory.Open(indexPath);
var analyzer = new StandardAnalyzer(AppLuceneVersion);

var indexConfig = new IndexWriterConfig(AppLuceneVersion, analyzer);
using var writer = new IndexWriter(dir, indexConfig);
using var lreader = writer.GetReader(applyAllDeletes: true);
var searcher = new IndexSearcher(lreader);

var exactQuery = new PhraseQuery();
exactQuery.Add(new Term("Id", id));
var search = searcher.Search(exactQuery, null, 1);
var docs = search.ScoreDocs;

if (docs?.Length == 1)
{
    Document d = searcher.Doc(docs[0].Doc);
    var title = d.Get("Title"));

}

The above source code for retrieving document based on Id, not for full-text search. The first few lines of code are standard initializers. Then we instantiated a PhraseQuery, we specified the search should happen on “Id” field. Then if there is a match, we retrieved the Title of the matching document.

Now let’s see how we can search based on Title and Description as mentioned above:

using var dir = FSDirectory.Open(indexPath);
var analyzer = new StandardAnalyzer(AppLuceneVersion);

var indexConfig = new IndexWriterConfig(AppLuceneVersion, analyzer);
using var writer = new IndexWriter(dir, indexConfig);
using var lreader = writer.GetReader(applyAllDeletes: true);
var searcher = new IndexSearcher(lreader);

string[] fnames = { "Title", "Description" };
var multiFieldQP = new MultiFieldQueryParser(AppLuceneVersion, fnames, analyzer);

Query query = multiFieldQP.Parse("My Search Term");
var search = searcher.Search(query, null, 10);

Console.WriteLine(search.TotalHits);

ScoreDoc[] docs = search.ScoreDocs;
for(var doc in docs) {
    Document d = searcher.Doc(docs[i].Doc);

    var Id = d.Get("Id");
    var Title = d.Get("Title");
    var Description = d.Get("Description");
}

In the above source code, we have the standard initializers in the first few lines. Then we are specifying the columns on which the search should happen in the fnames variable. Then we instantiated a MultiFieldQueryParser to enable searching on multiple fields. Then we built the query by specifying the search term. Advanced boolean queries can also be created in this step. Then the search is performed, we can specify how many documents the result should contain, in this case, we specified 10 results. The rest of the code is regarding fetching the field values.

I am hoping this blog article helps someone.

Categories
.Net C# Lucene Solr

Lucene vs Solr

I played around with Lucene.Net and Solr. Solr is built on top of Lucene.

Lucene.Net is a port of Lucene library written in C# for working with Lucene on Microsoft .Net stack.

Lucene is a library built by Apache Software Foundation. Lucene provides full-text search capabilities. There are few other alternatives such as Sphinx, full-text search capabilities built into RDBMS’s such as Microsoft SQL Server, MySQL, MariaDB, PostgreSQL etc… However, full-text search capabilities in RDBMS’s are not as efficient as Lucene.

Solr and ElasticSearch are built on top of Lucene. ElasticSearch is more suitable and efficient for time-series data.

Now let’s see more about Solr vs Lucene.

Solr provides some additional features such as replication, web app GUI, collecting and publishing metrics, fault-tolerant etc… Solr provides HTTP REST-based API’s for management and for adding documents, searching documents etc…

Directly working with Lucene would provide access to more fine-grained control.

Because Solr provides REST based API’s there is the overhead of establishing HTTP connection, formatting the requests, JSON serialization, and deserialization at both ends i.e client making the call and the Solr server. By directly working with Lucene this overhead does not exist.

If searching through the documents happens on the same server, working directly with Lucene might be efficient. Specifically in lesser data scenarios, but if huge datasets and scaling are a concern, Solr might be the proper approach.

If server infrastructure requirements require separate search servers and a bunch of application servers query the search servers for data, Solr might be more useful and easier because of existing support replication and HTTP API’s.

If performance is of the highest importance and still fine-grained control is needed, custom-built applications should expose the data from search servers and some other more efficient protocols such as gRPC could be used and obviously, replication mechanisms need to be custom-built.

Categories
.Net C# UnitTests

Moq Non-Invocable Test Setups

Most of you might know Moq library used for unit tests in .Net. General usage of Moq is for stubbing interfaces and verifying method calls are proper with the expected parameters, but Moq can also be used for validating that a certain method has not been called.

Assume you have an interface ISpecialInterface and expecting ISpecialInterface.Method() to be called only when a certain logic happens like an if condition. If you are writing a unit test for the logic to be false, you want to verify that ISpecialInterface.Method() is not invoked.

var mockSpecialInterface = new Mock<ISpecialInterface>(MockBehavior.Strict);

            mockSpecialInterface.Setup(ss => ss.Method(It.IsAny<string>()));

            var sut = new SpecialObject(mockSpecialInterface.Object);
            sut.LogicMethod();

            mockSpecialInterface.Verify(ss => ss.Method(It.IsAny<string>()), Times.Never());

In the above code, mockSpecialInterface.Verify(ss => ss.Method(It.IsAny<string>()), Times.Never()); is the code that verifies the method is not called.

Erra Diwakar alias Erra Kalyan and some other female who claims to have the first name of Kanti or Erra Sowjanya or Erra Sowmya together try to steal my identity (Kanti Kalyan Arumilli) using some Tamil Nadu-based naming logic. The identity thief couple. I don’t even know them yet they shadow me and claim my bank accounts as theirs by manipulating deliveries and couriers – impersonators with imposter syndrome and shadow rogue R&AW spies and terrorists.

Categories
.Net C#

Programatically configuring NLog in C#

Some people for various reasons might prefer programatically configuring NLog. Some use cases are for example, may be you don’t want to store sensitive information in nlog.config file. Some of the targets that require sensitive information in the config file are (Not an exhaustive list):

You might want to store the sensitive information somewhere else in an encrypted format. Then you might decrypt the password and programmatically configure the logger. Here is some code sample on how to configure such loggers.

var logConfig = new LoggingConfiguration();

//File
var fileTarget = new FileTarget
    {
        FileName=typeof(Program).FullName + ".log"
    };

fileTarget.Layout = @"${date:format=HH\:mm\:ss} ${logger}:${message};${exception}";

var fileRule = new LoggingRule("*", LogLevel.Debug, fileTarget);
logConfig.LoggingRules.Add(fileRule);
logConfig.AddTarget("logfile", fileTarget);

//DB
var dbTarget = new DatabaseTarget();

dbTarget.ConnectionString = YourSecureMethodForDecryptingAndObtainingConnectionString();

dbTarget.CommandText = @"INSERT INTO [Log] (Date, Thread, Level, Logger, Message, Exception) VALUES (GETDATE(), @thread, @level, @logger, @message, @exception)";

dbTarget.Parameters.Add(new DatabaseParameterInfo("@thread", new NLog.Layouts.SimpleLayout("${threadid}")));

.
.
.
logConfig.AddTarget("database", dbTarget);
var dbRule = new LoggingRule("*", LogLevel.Debug, dbTarget);
logConfig.LoggingRules.Add(dbRule);


LogManager.Configuration = logConfig;

In the above sample code, we have looked into how to add multiple types of targets – File and DB. How to set the layout for the FileTarget. How to configure logging rules and finally how to assign the programmatic config.

An interesting target is the Memory target, allows writing log messages to an ArrayList in memory for programmatic retrieval. Great for unit testing.

There are some code samples in the above mentioned link for Memory target.

Categories
AWS C# Github

New accompanying Github repository!

A new Github repository has been created at https://github.com/ALightTechnologyAndServicesLimited/Internal for holding code samples for all the future content of ALight Technology And Services Limited‘s technical blog or technical videos.

Youtube Channels:

www.youtube.com/channel/UCfWg1fhujnIf6b621UZ_SGg

www.youtube.com/channel/UCBuu5ksejp5uPIJmPuReSTA

Happy development. 🙂