Add free search for your website. Sign up now! https://webveta.alightservices.com/
Categories
A.I

An introduction to vector databases and having LLMs generate responses

Cross-Post: https://www.linkedin.com/pulse/introduction-vector-databases-having-llms-generate-arumilli-kg0tc

               In previous posts, I have mentioned what vector embeddings are, the need for embeddings, chunking text, some methods for chunking, and the importance of maintaining context. This post serves as an introduction to vector databases.

There are several databases that support storing and retrieving vector embeddings. Some dedicated vector databases include Chrome, QDrant, Vespa, LanceDB, etc. Others such as Solr, PostgreSQL, MongoDB Atlas (not the free community version), and Redis Cache also support vectors to varying

degrees.

Some important considerations when deciding on a vector database, similar to any other database:

1) Client SDK availability or using microservices

2) Sharding of data, replication for higher availability, and horizontal scaling

3) Ease of making backups of databases, creating snapshots, and restoring them

4) Types of vectors supported by the database

Some databases are easy to get started with but may not be suitable for production environments that require sharding, replication, and horizontal scaling.

               Once data gets chunked, stored and indexed. The next steps are retrieving appropriate documents from the databased based on the query (embeddings need to be generated for the query), and passing to LLM’s.

But in production scenarios few extra steps:

  1. Using tools such as LLM Guards to prevent malicious usage i.e well crafted query can be a prompt requesting to override the system message of an LLM – these are known as Prompt Injection.
  2. Having LLMs respond in a manner where the output can be parsed
  3. Caching of responses
  4. In some scenarios running output scanners on the response

WebVeta does this and much more and even cost controls. WebVeta can be easily integrated into any website by just using few lines of HTML!

https://webveta.alightservices.com

WebVeta running a KickStarter campaign, join the KickStarter campaign and get 25 – 40% discounts, price lock for at least 3 years.

https://www.kickstarter.com/projects/kantikalyanarumilli/webveta-power-your-website-with-ai-search

Mr. Kanti Kalyan Arumilli

Arumilli Kanti Kalyan, Founder & CEO
Arumilli Kanti Kalyan, Founder & CEO

B.Tech, M.B.A

Facebook

LinkedIn

Threads

Instagram

Youtube

Founder & CEO, Lead Full-Stack .Net developer

ALight Technology And Services Limited

ALight Technologies USA Inc

Youtube

Facebook

LinkedIn

Phone / SMS / WhatsApp on the following 3 numbers:

+91-789-362-6688, +1-480-347-6849, +44-07718-273-964

+44-33-3303-1284 (Preferred number if calling from U.K, No WhatsApp)

kantikalyan@gmail.com, kantikalyan@outlook.com, admin@alightservices.com, kantikalyan.arumilli@alightservices.com, KArumilli2020@student.hult.edu, KantiKArumilli@outlook.com and 3 more rarely used email addresses – hardly once or twice a year.

Categories
A.I

Different ways of text chunking for generating embeddings

Cross-post: https://www.linkedin.com/pulse/different-ways-text-chunking-generating-embeddings-arumilli-u8hjc

In previous posts https://www.alightservices.com/2024/10/03/an-introduction-to-text-chunking-for-purposes-of-vector-embedding/ I have talked about the reason for chunking and the concept. This post goes a little bit deeper and based on another person’ blog: Five Levels of Chunking Strategies in RAG| Notes from Greg’s Video | by Anurag Mishra | Medium and Github: RetrievalTutorials/tutorials/LevelsOfTextSplitting/5_Levels_Of_Text_Splitting.ipynb at main · FullStackRetrieval-com/RetrievalTutorials · GitHub

               For retrieval to work properly, documents need to be chunked, embeddings need to be generated and stored in vector databases. But when we chunk documents, how do we maintain context? What if a chunk’s primary information is in a different chunk but some specific information in a different chunk without context?

There are different ways of chunking.

Split by characters such as . ! ? Then combine the splitted sentences until max chunk size. Fast and easy but does not retain context.

Some people overlap some content for maintaining context.

Document based for example HTML. Chunk document by Headers. Well written HTML documents usually have context, but there is still the question of chunk size. What if a certain segment of HTML larger than max tokens of embedder?

The above methods are easy, faster and low cost.

The next set of methods are costly.

Another method is using embeddings for generating chunks. There are several approaches of using this method. Because embeddings of text and similarity of the embeddings generated determine if two texts are similar or not. Create small chunks based on sentences. Append sentences until chunk size meanwhile comparing similarity and some kind of relevancy threshold such as 0.95 and then start creating a new chunk and if needed some overlap. But this method needs lot of calls to embedders and could become costly.

Some text documents contain some information on Topic A, then discusses about Topic B and continue discussing Topic A. The previously discussed methods don’t handle this well.

This can be accomplished by either using embedders or LLMs. i.e chunk into smaller pieces and append relevant pieces of chunks. This is very costly because uses LLMs. The Github repo of https://twitter.com/GregKamradt Greg Kamradt has explained these concepts and provided code samples in Python.

WebVeta does the chunking, embedding, storing, retrieving and calling LLMs for generating responses and can be easily embedded in your websites with 2 – 3 lines of HTML! SMEs can focus on your business goals and still provide an advanced internal search engine for people who come to your website and looking for more information.

WebVeta currently running a KicksStarter campaign – https://www.kickstarter.com/projects/kantikalyanarumilli/webveta-power-your-website-with-ai-search

Or contact me for a free trial of WebVeta.

Mr. Kanti Kalyan Arumilli

Arumilli Kanti Kalyan, Founder & CEO
Arumilli Kanti Kalyan, Founder & CEO

B.Tech, M.B.A

Facebook

LinkedIn

Threads

Instagram

Youtube

Founder & CEO, Lead Full-Stack .Net developer

ALight Technology And Services Limited

ALight Technologies USA Inc

Youtube

Facebook

LinkedIn

Phone / SMS / WhatsApp on the following 3 numbers:

+91-789-362-6688, +1-480-347-6849, +44-07718-273-964

+44-33-3303-1284 (Preferred number if calling from U.K, No WhatsApp)

kantikalyan@gmail.com, kantikalyan@outlook.com, admin@alightservices.com, kantikalyan.arumilli@alightservices.com, KArumilli2020@student.hult.edu, KantiKArumilli@outlook.com and 3 more rarely used email addresses – hardly once or twice a year.

Categories
A.I

An introduction to text chunking, embeddings

Cross Post – https://www.linkedin.com/pulse/introduction-text-chunking-embeddings-kanti-kalyan-arumilli-hvifc

In my previous post, I have talked about text chunking. In this blog post let’s look at text embeddings:

  1. Text Embeddings of various output lengths.
  2. Late Interaction embeddings

Text embeddings are an array of floats. But most of the time, the meaning of a given text cannot be captured. Late Interaction embeddings are array of arrays, representing the meaning of text in different possible ways.

Once we chunk documents, we create embeddings of the text and store the chunks along with the embeddings in vector databases.

For retrieval generate embeddings of the text and query the database for the documents, vector databases retrieve documents based on how close the floats of the query are to the documents stored.

Later I would write a blog post of some vector databases. There are few different algorithms for retrieving documents but for embeddings usually using cosine similarity is a good choice.

In last blog post, I have mentioned about losing of contextual information in chunks. Certain strategies can be used for adding context into chunks and even different strategies can be used for cleaning the chunks and then generating embeddings are usually an effective way.

The next 4 posts are going to be on the topics of:  different vector databases, then retrieving and reranking of documents, some options of running LLMs, generating RAG responses.

Or if you have a website or blog and need an advanced A.I based RAG ready search engine without the hassle of the above, use WebVeta. WebVeta perfect for keyword search, full-text search and even RAG. The RAG queries, responses are cached and reduces LLM token costs.

Sign-up: https://webveta.alightservices.com/

WebVeta Blog – https://blog.alightservices.com/search/label/WebVeta

WebVeta Youtube Playlist – https://www.youtube.com/playlist?list=PLs7D8ybThnZhi9Sx17ft9_vRz5J0S_ZSZ

Mr. Kanti Kalyan Arumilli

Arumilli Kanti Kalyan, Founder & CEO
Arumilli Kanti Kalyan, Founder & CEO

B.Tech, M.B.A

Facebook

LinkedIn

Threads

Instagram

Youtube

Founder & CEO, Lead Full-Stack .Net developer

ALight Technology And Services Limited

ALight Technologies USA Inc

Youtube

Facebook

LinkedIn

Phone / SMS / WhatsApp on the following 3 numbers:

+91-789-362-6688, +1-480-347-6849, +44-07718-273-964

+44-33-3303-1284 (Preferred number if calling from U.K, No WhatsApp)

kantikalyan@gmail.com, kantikalyan@outlook.com, admin@alightservices.com, kantikalyan.arumilli@alightservices.com, KArumilli2020@student.hult.edu, KantiKArumilli@outlook.com and 3 more rarely used email addresses – hardly once or twice a year.

Categories
A.I Artificial Intelligence NLP

An introduction to vector embeddings

               Have you ever wondered how machines truly understand the meaning of words? While they lack the nuanced comprehension we humans possess, they’ve learned to represent words as numerical vectors, opening a world of possibilities for tasks like search, translation, and even creative writing.

Cross Post – https://www.linkedin.com/pulse/introduction-vector-embeddings-kanti-kalyan-arumilli-wqg8c

Imagine each word in the English language as a point in a multi-dimensional space. Words with similar meanings cluster together, while those with contrasting meanings reside farther apart. This is the essence of vector embeddings: representing words as numerical vectors, capturing semantic relationships between them.

These vectors aren’t arbitrary; they’re learned through sophisticated machine learning algorithms trained on massive text datasets.

Vector embeddings revolutionize how machines process language because:

Semantic Similarity:  Words with similar meanings have vectors that are close together in the “semantic space.” This allows machines to identify synonyms, antonyms, and even subtle relationships between words.

Contextual Understanding: Capturing the nuanced meaning of a word based on its surrounding words.

Improved Performance: Embedding vectors as input to machine learning models often leads to significant performance gains in tasks like text classification, sentiment analysis, machine translation and neural search.

There are various types such as Dense, Sparse and Late Interaction. In each type there are several models trained on various datasets, fine-tuned on different datasets. The computational expenses and requirements are significantly different. Some models need high cpu, memory yet underperform and some models need lesser cpu and memory and yet perform well. However, based on the dataset and number of tokens used for generation, models trained on same datasets and higher number of tokens usually outperform models trained on same datasets and lower number of tokens.

Here is a very interesting link – https://huggingface.co/spaces/mteb/leaderboard

The above page lists several models, memory requirements, scores for various tasks, size of embeddings generated etc… Most of these models are free under MIT license and some are commercial.

In the past I have written a blog post about https://www.alightservices.com/2024/04/27/how-to-get-text-embeddings-from-meta-llama-using-c-net/ converting llama 2 / 3 into gguf and how to interact using C#.

Most of the free models mentioned in the above leaderboard have gguf and can be directly used from C# or via free HTTP local server for getting embeddings such as ollama, llama.cpp. But some models don’t have gguf, probably some can be converted or some might not. Some models have onnx format available. Some might need python code for generating embeddings. I have tried IronPython. But not suggesting IronPython or any 3rd party wrappers because of less reliability. Here is a blog post mentioning about Python integration from .Net https://www.alightservices.com/2024/04/09/c-net-python-and-nlp-natural-language-processing/

Proud partner of Microsoft for Startups
Proud partner of Microsoft for Startups

Mr. Kanti Kalyan Arumilli

Arumilli Kanti Kalyan, Founder & CEO
Arumilli Kanti Kalyan, Founder & CEO

B.Tech, M.B.A

Facebook

LinkedIn

Threads

Instagram

Youtube

Founder & CEO, Lead Full-Stack .Net developer

ALight Technology And Services Limited

ALight Technologies USA Inc

Youtube

Facebook

LinkedIn

Phone / SMS / WhatsApp on the following 3 numbers:

+91-789-362-6688, +1-480-347-6849, +44-07718-273-964

+44-33-3303-1284 (Preferred number if calling from U.K, No WhatsApp)

kantikalyan@gmail.com, kantikalyan@outlook.com, admin@alightservices.com, kantikalyan.arumilli@alightservices.com, KArumilli2020@student.hult.edu, KantiKArumilli@outlook.com and 3 more rarely used email addresses – hardly once or twice a year.

Categories
.Net A.I Azure C# LLM NLP

Using C# and Azure OpenAI services

If you have access to Azure OpenAI services, the following code snippet shows how to chat with ChatGPT!

Some general tips:

  1. Secure your networks, i.e use private endpoints inside Azure! Stolen keys can be used by other people if the network is not secured.

This code example is C# version of what’s discussed in https://github.com/AzureCosmosDB/Azure-OpenAI-Python-Developer-Guide/blob/main/05_Explore_OpenAI_models/README.md

The above link is for Python developers, this blog post for C# developers.

var chatClient = new OpenAIClient(new Uri(azureEndPoint), 
       new AzureKeyCredential(apiKey));

var chatCompletionOptions = new ChatCompletionsOptions();

chatCompletionOptions.DeploymentName = "gpt35";

chatCompletionOptions.Messages.Add(new
    ChatRequestSystemMessage("You are a helpful, fun and friendly sales assistant for Cosmic Works, a bicycle and bicycle accessories store."));

chatCompletionOptions.Messages.Add(new 
    ChatRequestUserMessage("Do you sell bicycles?"));

chatCompletionOptions.Messages.Add(new
    ChatRequestAssistantMessage("Yes, we do sell bicycles. What kind of bicycle are you looking for?"));

chatCompletionOptions.Messages.Add(new
    ChatRequestUserMessage("I'm not sure what I'm looking for. Could you help me decide?"));

var response = await 
    chatClient.GetChatCompletionsAsync(chatCompletionOptions);

if (response != null && response.Value != null && 
    response.Value.Choices != null && 
    response.Value.Choices.Count > 0)
{

    System.Console.WriteLine(
        response.Value.Choices.ElementAt(0).Message.Content);
}

The above code has 3 configuration variables:

  1. azureEndPoint – This is the endpoint from Azure Portal.
  2. apiKey – One of the API keys from Azure Portal.
  3. The deployment name of the model that has been added through Azure Portal.

If the code was run successful, output looks like this:

If there is network connectivity issues, there might be exceptions, if denied due to network security issues errors might be like the following:

Mr. Kanti Kalyan Arumilli

Arumilli Kanti Kalyan, Founder & CEO
Arumilli Kanti Kalyan, Founder & CEO

B.Tech, M.B.A

Facebook

LinkedIn

Threads

Instagram

Youtube

Founder & CEO, Lead Full-Stack .Net developer

ALight Technology And Services Limited

ALight Technologies USA Inc

Youtube

Facebook

LinkedIn

Phone / SMS / WhatsApp on the following 3 numbers:

+91-789-362-6688, +1-480-347-6849, +44-07718-273-964

+44-33-3303-1284 (Preferred number if calling from U.K, No WhatsApp)

kantikalyan@gmail.com, kantikalyan@outlook.com, admin@alightservices.com, kantikalyan.arumilli@alightservices.com, KArumilli2020@student.hult.edu, KantiKArumilli@outlook.com and 3 more rarely used email addresses – hardly once or twice a year.

Categories
A.I Artificial Intelligence Llama LLM

Have you tried Ollama – ChatGPT on your local machine, great software!

Most of you know, from around 2016, I had interest in DataScience/Machine Learning/Artificial Intelligence and even did some courses as a hobby! I am primarily .Net full-stack web developer, but A.I has been fascinating and I have been hobbyist!

In 2021 I started my own startup, 2023 prototyped a concept for a SaaS product known as WebVeta! 2024 – launched an mvp and now is the time to dive into A.I. Over the past 2 weeks, I was experimenting with several different things in A.I both from a development perspective, features perspective!

Over the past 2 days I am playing around with a nice software that allows working with several LLMs from local machine! I would say at least 16GB of RAM, possibly slightly higher.

https://ollama.com

https://github.com/ollama/ollama

The setup instructions are straightforward!

On the Github page, under “Community Integrations” -> “Web & Desktop” there are several web and desktop clients for UI, choose one of those based on your operating system and you can play around with a large set of A.I models. The list of models can be found at: https://ollama.com/library

Try llama3, phi3 if you have enough CPU and RAM! Or try the smaller models – tinydolphin, tinyllama! There are several coding related LLMs i.e GitHub co-pilot’ish and there are some Visual Studio Code extensions that can communicate with locally version of Ollama and help with code!

Remember the LLMs need to be downloaded, the exact syntax is provided on the LLMs pages, but the general syntax is:

ollama pull <LLM_NAME>

I have used https://github.com/ollama-ui/ollama-ui on Linux, https://github.com/tgraupmann/WinForm_Ollama_Copilot for the client UI!

The client UI’s query and get the available local LLMs and allow specifying / selecting which particular LLM to interact with.

If anyone interested let me know via any of my social media profiles, I might consider doing a small demo for any enthusiasts!

Ollama is a great tool and great effort by the team of developers who developed Ollama! Thank you!

WebVeta – Advanced, unified, consistent search for your website(s), from content of your website(s), blogs(s). First 50 customers, who sign-up prior to 15/05/2024 get unlimited access to existing features, newer features for at least 1 year. Sign up now! https://webveta.alightservices.com/

Mr. Kanti Kalyan Arumilli

Arumilli Kanti Kalyan, Founder & CEO
Arumilli Kanti Kalyan, Founder & CEO

B.Tech, M.B.A

Facebook

LinkedIn

Threads

Instagram

Youtube

Founder & CEO, Lead Full-Stack .Net developer

ALight Technology And Services Limited

ALight Technologies USA Inc

Youtube

Facebook

LinkedIn

Phone / SMS / WhatsApp on the following 3 numbers:

+91-789-362-6688, +1-480-347-6849, +44-07718-273-964

+44-33-3303-1284 (Preferred number if calling from U.K, No WhatsApp)

kantikalyan@gmail.com, kantikalyan@outlook.com, admin@alightservices.com, kantikalyan.arumilli@alightservices.com, KArumilli2020@student.hult.edu, KantiKArumilli@outlook.com and 3 more rarely used email addresses – hardly once or twice a year.

Categories
.Net A.I Artificial Intelligence C# Llama LLM NLP

How to get text embeddings from Meta Llama using C# .Net

This post is about getting text embeddings i.e vector representation of text using C# .Net and using Meta’s Llama 2!

Meta’s Llama

Meta (Facebook) has released few different LLM’s, the latest Llama3, but this blog post about Llama2. Using Llama3 might be similar, but I have not tried yet! There are few more things that can be tried, but those are out of scope and this is an end to end blog post for using Llama2 using C#.

https://llama.meta.com/

From the above link provide click “Download Models”, provide information. Then links to some github, some keys are provided. Make note of the keys. The keys are valid for 24 hours and each model can be downloaded 5 times.

llama.cpp

We use llama.cpp for certain activities:

https://github.com/ggerganov/llama.cpp

LLamaSharp

This is the wrapper for interacting from C# .Net with Llama models.

I have introduced the tools and software that are going to be used. Now, let’s look at the different steps:

  1. Download Llama model (Meta’s Llama has Llama 2 and Llama 3, each has smaller and larger models, this discusses the smallest model from Llama 2)
  2. Prepare and convert Llama model into gguf format.
  3. Use in C# code

Download Llama model:

Once you submit your information and receive the keys from Meta Facebook, clone the repo:

https://github.com/meta-llama/llama for Llama2,

https://github.com/meta-llama/llama3 for Llama3

git clone https://github.com/meta-llama/llama

Navigate into llama folder, then run download.sh

cd llama
sudo ./download.sh

You would be prompted for the download key, enter the key.

Now 12.5 GB file gets downloaded into a folder “llama-2-7b”

Prepare and convert Llama model into gguf format:

We are going to convert the Llama model into gguf format. For this we need Python3 and Python3-Pip, if these are not installed, install using the following command

sudo apt install python3 python3-pip

Clone the llama.cpp repo into a different directory.

git clone https://github.com/ggerganov/llama.cpp

Navigate into llama.cpp and compile

cd llama.cpp
make -j

Install the requirement for python:

python3 -m pip install -r requirements.txt

Now copy the entire “llama-2-7b” into llama.cpp/models.

Listing models directory should show “llama–2-7b”

ls ./models
python3 convert.py models/llama-2-7b/

This generates a 2.17 GB file ggml-model-f32.gguf

Now run the following command:

./quantize ./models/llama-2-7b/ggml-model-f32.gguf ./models/llama-2-7b/ggml-model-Q4_K_M.gguf Q4_K_M

This should generate a 3.79 GB file.

Optional (I have NOT tried this yet)

The following extra params can be passed for the python3 convert.py models/llama-2-7b/

python convert.py models/llama-2-7b/ --vocab-type bpe

C# code

Create a new or in an existing project add the following Nuget packages:

LLamaSharp

LLamaSharp.Backend.Cpu or LLamaSharp.Backend.Cuda11 or 
LLamaSharp.Backend.Cuda12 or LLamaSharp.Backend.OpenCL

// I used LLamaSharp.Backend.Cpu

Use the following using statements:

using LLama;
using LLama.Common;

The following code is adapted from the samples of LlamaSharp – https://github.com/SciSharp/LLamaSharp/blob/master/LLama.Examples/Examples/GetEmbeddings.cs

string modelPath = PATH_TO_GGUF_FILE

var @params = new ModelParams(modelPath) {EmbeddingMode = true };
using var weights = LLamaWeights.LoadFromFile(@params);
var embedder = new LLamaEmbedder(weights, @params);

Use the path for your .gguf from quantize step file’s path.

Here is code for getting embeddings:

float[] embeddings = embedder.GetEmbeddings("Hello, this is sample text for embeddings").Result;

Hope this helps some people, I am .Net developer (primarily C#), A.I enthusiast.

Mr. Kanti Kalyan Arumilli

Arumilli Kanti Kalyan, Founder & CEO
Arumilli Kanti Kalyan, Founder & CEO

B.Tech, M.B.A

Facebook

LinkedIn

Threads

Instagram

Youtube

Founder & CEO, Lead Full-Stack .Net developer

ALight Technology And Services Limited

ALight Technologies USA Inc

Youtube

Facebook

LinkedIn

Phone / SMS / WhatsApp on the following 3 numbers:

+91-789-362-6688, +1-480-347-6849, +44-07718-273-964

+44-33-3303-1284 (Preferred number if calling from U.K, No WhatsApp)

kantikalyan@gmail.com, kantikalyan@outlook.com, admin@alightservices.com, kantikalyan.arumilli@alightservices.com, KArumilli2020@student.hult.edu, KantiKArumilli@outlook.com and 3 more rarely used email addresses – hardly once or twice a year.