Cross Post – https://www.linkedin.com/pulse/introduction-text-chunking-embeddings-kanti-kalyan-arumilli-hvifc
In my previous post, I have talked about text chunking. In this blog post let’s look at text embeddings:
- Text Embeddings of various output lengths.
- Late Interaction embeddings
Text embeddings are an array of floats. But most of the time, the meaning of a given text cannot be captured. Late Interaction embeddings are array of arrays, representing the meaning of text in different possible ways.
Once we chunk documents, we create embeddings of the text and store the chunks along with the embeddings in vector databases.
For retrieval generate embeddings of the text and query the database for the documents, vector databases retrieve documents based on how close the floats of the query are to the documents stored.
Later I would write a blog post of some vector databases. There are few different algorithms for retrieving documents but for embeddings usually using cosine similarity is a good choice.
In last blog post, I have mentioned about losing of contextual information in chunks. Certain strategies can be used for adding context into chunks and even different strategies can be used for cleaning the chunks and then generating embeddings are usually an effective way.
The next 4 posts are going to be on the topics of: different vector databases, then retrieving and reranking of documents, some options of running LLMs, generating RAG responses.
Or if you have a website or blog and need an advanced A.I based RAG ready search engine without the hassle of the above, use WebVeta. WebVeta perfect for keyword search, full-text search and even RAG. The RAG queries, responses are cached and reduces LLM token costs.
Sign-up: https://webveta.alightservices.com/
WebVeta Blog – https://blog.alightservices.com/search/label/WebVeta
WebVeta Youtube Playlist – https://www.youtube.com/playlist?list=PLs7D8ybThnZhi9Sx17ft9_vRz5J0S_ZSZ
–
Mr. Kanti Kalyan Arumilli
B.Tech, M.B.A
Founder & CEO, Lead Full-Stack .Net developer
ALight Technology And Services Limited
Phone / SMS / WhatsApp on the following 3 numbers:
+91-789-362-6688, +1-480-347-6849, +44-07718-273-964
+44-33-3303-1284 (Preferred number if calling from U.K, No WhatsApp)
kantikalyan@gmail.com, kantikalyan@outlook.com, admin@alightservices.com, kantikalyan.arumilli@alightservices.com, KArumilli2020@student.hult.edu, KantiKArumilli@outlook.com and 3 more rarely used email addresses – hardly once or twice a year.