Add free search for your website. Sign up now! https://webveta.alightservices.com/
Categories
RAG

An introduction to text chunking for purposes of vector embedding.

Cross post – https://www.linkedin.com/pulse/introduction-text-chunking-purposes-vector-embedding-arumilli-zf6vc

In a previous post “An introduction to vector embeddings”, I have mentioned about vector embeddings and have posted a link for MTEB leaderboard. In this article I am going to discuss about “chunking”.

WebVeta does this and much much more in the A.I tier, you can sign-up, add your domains, blogs, validate ownership of your websites, few settings, copy and paste 2 – 3 lines of HTML and relax. WebVeta has a KickStarter campaign running and offering significant discounts via KickStarter.

If you see in the MTEB leaderboard, the 6th column Max tokens. Max tokens is the number of tokens accepted by the embedding model.

When using embedding models that have smaller max tokens and when we want to store large texts, we need to chunk the text.

In certain cases, even if a embedding model uses a large Max tokens, we want to look for specific information in a text document, even then we need to chunk. Let’s say I have a document that has 30,000 chunks, when I am searching for something, if I need entire document then I can use model with 32k max tokens. But what if I want just a small paragraph that has the information and if I don’t care about the rest of the document? For handling this looking for specific information type of scenario, we can chunk the documents into smaller parts and then index the chunks rather than the whole document.

For example: “Mr. Kanti Kalyan Arumilli is the founder and CEO of ALight Technology And Services Limited and ALight Technologies USA Inc” has approximately 33 chunks.

There could be other use-cases for chunking.

But chunking loses the context. In a large document, let’s say about some person, when the entire document is in context, we know from the entire text that the document is about the person. When we use chunks, some paragraph might use the term ‘he’ / ‘him’, and when the paragraph is seen in isolation, we lose the context. Too small chunks are a problem, too big chunks are not useful in certain use-cases.

Various approaches need to be considered for retaining context or adding context. There are various approaches and there is no absolute answer of what’s the best chunk size. I have seen decent results for chunks of sizes between 384 – 1024 tokens.

I am planning few more blog posts over the next few weeks regarding LLM’s RAG’s, Vector Databases, Storing, Retrieving etc…

Mr. Kanti Kalyan Arumilli

Arumilli Kanti Kalyan, Founder & CEO
Arumilli Kanti Kalyan, Founder & CEO

B.Tech, M.B.A

Facebook

LinkedIn

Threads

Instagram

Youtube

Founder & CEO, Lead Full-Stack .Net developer

ALight Technology And Services Limited

ALight Technologies USA Inc

Youtube

Facebook

LinkedIn

Phone / SMS / WhatsApp on the following 3 numbers:

+91-789-362-6688, +1-480-347-6849, +44-07718-273-964

+44-33-3303-1284 (Preferred number if calling from U.K, No WhatsApp)

kantikalyan@gmail.com, kantikalyan@outlook.com, admin@alightservices.com, kantikalyan.arumilli@alightservices.com, KArumilli2020@student.hult.edu, KantiKArumilli@outlook.com and 3 more rarely used email addresses – hardly once or twice a year.

Categories
.Net A.I Artificial Intelligence C# Llama LLM NLP

How to get text embeddings from Meta Llama using C# .Net

This post is about getting text embeddings i.e vector representation of text using C# .Net and using Meta’s Llama 2!

Meta’s Llama

Meta (Facebook) has released few different LLM’s, the latest Llama3, but this blog post about Llama2. Using Llama3 might be similar, but I have not tried yet! There are few more things that can be tried, but those are out of scope and this is an end to end blog post for using Llama2 using C#.

https://llama.meta.com/

From the above link provide click “Download Models”, provide information. Then links to some github, some keys are provided. Make note of the keys. The keys are valid for 24 hours and each model can be downloaded 5 times.

llama.cpp

We use llama.cpp for certain activities:

https://github.com/ggerganov/llama.cpp

LLamaSharp

This is the wrapper for interacting from C# .Net with Llama models.

I have introduced the tools and software that are going to be used. Now, let’s look at the different steps:

  1. Download Llama model (Meta’s Llama has Llama 2 and Llama 3, each has smaller and larger models, this discusses the smallest model from Llama 2)
  2. Prepare and convert Llama model into gguf format.
  3. Use in C# code

Download Llama model:

Once you submit your information and receive the keys from Meta Facebook, clone the repo:

https://github.com/meta-llama/llama for Llama2,

https://github.com/meta-llama/llama3 for Llama3

git clone https://github.com/meta-llama/llama

Navigate into llama folder, then run download.sh

cd llama
sudo ./download.sh

You would be prompted for the download key, enter the key.

Now 12.5 GB file gets downloaded into a folder “llama-2-7b”

Prepare and convert Llama model into gguf format:

We are going to convert the Llama model into gguf format. For this we need Python3 and Python3-Pip, if these are not installed, install using the following command

sudo apt install python3 python3-pip

Clone the llama.cpp repo into a different directory.

git clone https://github.com/ggerganov/llama.cpp

Navigate into llama.cpp and compile

cd llama.cpp
make -j

Install the requirement for python:

python3 -m pip install -r requirements.txt

Now copy the entire “llama-2-7b” into llama.cpp/models.

Listing models directory should show “llama–2-7b”

ls ./models
python3 convert.py models/llama-2-7b/

This generates a 2.17 GB file ggml-model-f32.gguf

Now run the following command:

./quantize ./models/llama-2-7b/ggml-model-f32.gguf ./models/llama-2-7b/ggml-model-Q4_K_M.gguf Q4_K_M

This should generate a 3.79 GB file.

Optional (I have NOT tried this yet)

The following extra params can be passed for the python3 convert.py models/llama-2-7b/

python convert.py models/llama-2-7b/ --vocab-type bpe

C# code

Create a new or in an existing project add the following Nuget packages:

LLamaSharp

LLamaSharp.Backend.Cpu or LLamaSharp.Backend.Cuda11 or 
LLamaSharp.Backend.Cuda12 or LLamaSharp.Backend.OpenCL

// I used LLamaSharp.Backend.Cpu

Use the following using statements:

using LLama;
using LLama.Common;

The following code is adapted from the samples of LlamaSharp – https://github.com/SciSharp/LLamaSharp/blob/master/LLama.Examples/Examples/GetEmbeddings.cs

string modelPath = PATH_TO_GGUF_FILE

var @params = new ModelParams(modelPath) {EmbeddingMode = true };
using var weights = LLamaWeights.LoadFromFile(@params);
var embedder = new LLamaEmbedder(weights, @params);

Use the path for your .gguf from quantize step file’s path.

Here is code for getting embeddings:

float[] embeddings = embedder.GetEmbeddings("Hello, this is sample text for embeddings").Result;

Hope this helps some people, I am .Net developer (primarily C#), A.I enthusiast.

Mr. Kanti Kalyan Arumilli

Arumilli Kanti Kalyan, Founder & CEO
Arumilli Kanti Kalyan, Founder & CEO

B.Tech, M.B.A

Facebook

LinkedIn

Threads

Instagram

Youtube

Founder & CEO, Lead Full-Stack .Net developer

ALight Technology And Services Limited

ALight Technologies USA Inc

Youtube

Facebook

LinkedIn

Phone / SMS / WhatsApp on the following 3 numbers:

+91-789-362-6688, +1-480-347-6849, +44-07718-273-964

+44-33-3303-1284 (Preferred number if calling from U.K, No WhatsApp)

kantikalyan@gmail.com, kantikalyan@outlook.com, admin@alightservices.com, kantikalyan.arumilli@alightservices.com, KArumilli2020@student.hult.edu, KantiKArumilli@outlook.com and 3 more rarely used email addresses – hardly once or twice a year.