Add free search for your website. Sign up now! https://webveta.alightservices.com/
Categories
A.I

An introduction to text chunking, embeddings

Cross Post – https://www.linkedin.com/pulse/introduction-text-chunking-embeddings-kanti-kalyan-arumilli-hvifc

In my previous post, I have talked about text chunking. In this blog post let’s look at text embeddings:

  1. Text Embeddings of various output lengths.
  2. Late Interaction embeddings

Text embeddings are an array of floats. But most of the time, the meaning of a given text cannot be captured. Late Interaction embeddings are array of arrays, representing the meaning of text in different possible ways.

Once we chunk documents, we create embeddings of the text and store the chunks along with the embeddings in vector databases.

For retrieval generate embeddings of the text and query the database for the documents, vector databases retrieve documents based on how close the floats of the query are to the documents stored.

Later I would write a blog post of some vector databases. There are few different algorithms for retrieving documents but for embeddings usually using cosine similarity is a good choice.

In last blog post, I have mentioned about losing of contextual information in chunks. Certain strategies can be used for adding context into chunks and even different strategies can be used for cleaning the chunks and then generating embeddings are usually an effective way.

The next 4 posts are going to be on the topics of:  different vector databases, then retrieving and reranking of documents, some options of running LLMs, generating RAG responses.

Or if you have a website or blog and need an advanced A.I based RAG ready search engine without the hassle of the above, use WebVeta. WebVeta perfect for keyword search, full-text search and even RAG. The RAG queries, responses are cached and reduces LLM token costs.

Sign-up: https://webveta.alightservices.com/

WebVeta Blog – https://blog.alightservices.com/search/label/WebVeta

WebVeta Youtube Playlist – https://www.youtube.com/playlist?list=PLs7D8ybThnZhi9Sx17ft9_vRz5J0S_ZSZ

Mr. Kanti Kalyan Arumilli

Arumilli Kanti Kalyan, Founder & CEO
Arumilli Kanti Kalyan, Founder & CEO

B.Tech, M.B.A

Facebook

LinkedIn

Threads

Instagram

Youtube

Founder & CEO, Lead Full-Stack .Net developer

ALight Technology And Services Limited

ALight Technologies USA Inc

Youtube

Facebook

LinkedIn

Phone / SMS / WhatsApp on the following 3 numbers:

+91-789-362-6688, +1-480-347-6849, +44-07718-273-964

+44-33-3303-1284 (Preferred number if calling from U.K, No WhatsApp)

kantikalyan@gmail.com, kantikalyan@outlook.com, admin@alightservices.com, kantikalyan.arumilli@alightservices.com, KArumilli2020@student.hult.edu, KantiKArumilli@outlook.com and 3 more rarely used email addresses – hardly once or twice a year.