This post is about getting text embeddings i.e vector representation of text using C# .Net and using Meta’s Llama 2!
Meta’s Llama
Meta (Facebook) has released few different LLM’s, the latest Llama3, but this blog post about Llama2. Using Llama3 might be similar, but I have not tried yet! There are few more things that can be tried, but those are out of scope and this is an end to end blog post for using Llama2 using C#.
From the above link provide click “Download Models”, provide information. Then links to some github, some keys are provided. Make note of the keys. The keys are valid for 24 hours and each model can be downloaded 5 times.
llama.cpp
We use llama.cpp for certain activities:
https://github.com/ggerganov/llama.cpp
LLamaSharp
This is the wrapper for interacting from C# .Net with Llama models.
I have introduced the tools and software that are going to be used. Now, let’s look at the different steps:
- Download Llama model (Meta’s Llama has Llama 2 and Llama 3, each has smaller and larger models, this discusses the smallest model from Llama 2)
- Prepare and convert Llama model into gguf format.
- Use in C# code
Download Llama model:
Once you submit your information and receive the keys from Meta Facebook, clone the repo:
https://github.com/meta-llama/llama for Llama2,
https://github.com/meta-llama/llama3 for Llama3
git clone https://github.com/meta-llama/llama
Navigate into llama folder, then run download.sh
cd llama
sudo ./download.sh
You would be prompted for the download key, enter the key.
Now 12.5 GB file gets downloaded into a folder “llama-2-7b”
Prepare and convert Llama model into gguf format:
We are going to convert the Llama model into gguf format. For this we need Python3 and Python3-Pip, if these are not installed, install using the following command
sudo apt install python3 python3-pip
Clone the llama.cpp repo into a different directory.
git clone https://github.com/ggerganov/llama.cpp
Navigate into llama.cpp and compile
cd llama.cpp
make -j
Install the requirement for python:
python3 -m pip install -r requirements.txt
Now copy the entire “llama-2-7b” into llama.cpp/models.
Listing models directory should show “llama–2-7b”
ls ./models
python3 convert.py models/llama-2-7b/
This generates a 2.17 GB file ggml-model-f32.gguf
Now run the following command:
./quantize ./models/llama-2-7b/ggml-model-f32.gguf ./models/llama-2-7b/ggml-model-Q4_K_M.gguf Q4_K_M
This should generate a 3.79 GB file.
Optional (I have NOT tried this yet)
The following extra params can be passed for the python3 convert.py models/llama-2-7b/
python convert.py models/llama-2-7b/ --vocab-type bpe
C# code
Create a new or in an existing project add the following Nuget packages:
LLamaSharp
LLamaSharp.Backend.Cpu or LLamaSharp.Backend.Cuda11 or
LLamaSharp.Backend.Cuda12 or LLamaSharp.Backend.OpenCL
// I used LLamaSharp.Backend.Cpu
Use the following using statements:
using LLama;
using LLama.Common;
The following code is adapted from the samples of LlamaSharp – https://github.com/SciSharp/LLamaSharp/blob/master/LLama.Examples/Examples/GetEmbeddings.cs
string modelPath = PATH_TO_GGUF_FILE
var @params = new ModelParams(modelPath) {EmbeddingMode = true };
using var weights = LLamaWeights.LoadFromFile(@params);
var embedder = new LLamaEmbedder(weights, @params);
Use the path for your .gguf from quantize step file’s path.
Here is code for getting embeddings:
float[] embeddings = embedder.GetEmbeddings("Hello, this is sample text for embeddings").Result;
Hope this helps some people, I am .Net developer (primarily C#), A.I enthusiast.
–
Mr. Kanti Kalyan Arumilli
B.Tech, M.B.A
Founder & CEO, Lead Full-Stack .Net developer
ALight Technology And Services Limited
Phone / SMS / WhatsApp on the following 3 numbers:
+91-789-362-6688, +1-480-347-6849, +44-07718-273-964
+44-33-3303-1284 (Preferred number if calling from U.K, No WhatsApp)
kantikalyan@gmail.com, kantikalyan@outlook.com, admin@alightservices.com, kantikalyan.arumilli@alightservices.com, KArumilli2020@student.hult.edu, KantiKArumilli@outlook.com and 3 more rarely used email addresses – hardly once or twice a year.