Add free search for your website. Sign up now! https://webveta.alightservices.com/
Categories
WebVeta

Unlock the Power of A.I based Search with RAG Premium Tier

Are you looking to take your website’s search functionality to the next level? Do you want to provide your users with a seamless and accurate search experience that sets your site apart from the competition? Look no further than WebVeta’s RAG Premium tier.

Free embedding LLM tokens for early birds who sign up for RAG Premium Tier on or after November 4.

What is RAG Premium Tier?

RAG Premium tier is an advanced search solution that uses a combination of algorithms for the best of full-text, keyword, and semantic search to deliver high accuracy and relevance. But that’s not all – with RAG Premium, you can also leverage the capabilities of Generative A.I using Large Language Models (LLMs), allowing your users to get precise answers to questions.

The Advantages of RAG Premium Tier:

With RAG Premium tier, you’ll experience:

1. Semantic Search: Our intent-based search engine uses machine learning algorithms to understand the context and intent behind your users’ queries, ensuring that they get the most relevant answers.

2. Easy Integration of Generative A.I: Simply copy and paste a few lines of HTML, and you can unlock the power of Generative A.I to provide precise answers from your own content.

3. Improved User Experience: With RAG Premium tier, your users will be able to find what they’re looking for quickly and easily, leading to increased satisfaction and engagement.

Get Ahead of the Competition

In a crowded digital landscape, standing out from the competition is crucial. With RAG Premium tier, you can provide your users with an unparalleled search experience that sets your site apart from the rest.

What’s Next?

WebVeta’s RAG Premium tier is just one part of our comprehensive search solution. Join us on November 4 for the official launch and discover how we can help take your website to new heights.

Premium RAG tier members get free token credits for 2 months until January. Free embedding tokens, free internal processing tokens, free tokens for 1500 ChatGPT type of responses but responses based from your own content.

Get Ready to Experience the Future of Search

Don’t miss out on this opportunity to transform your website’s search functionality.

Sign up for a free trial: https://webveta.alightservices.com/

Mr. Kanti Kalyan Arumilli

Arumilli Kanti Kalyan, Founder & CEO
Arumilli Kanti Kalyan, Founder & CEO

B.Tech, M.B.A

Facebook

LinkedIn

Threads

Instagram

Youtube

Founder & CEO, Lead Full-Stack .Net developer

ALight Technology And Services Limited

ALight Technologies USA Inc

Youtube

Facebook

LinkedIn

Phone / SMS / WhatsApp on the following 3 numbers:

+91-789-362-6688, +1-480-347-6849, +44-07718-273-964

+44-33-3303-1284 (Preferred number if calling from U.K, No WhatsApp)

kantikalyan@gmail.com, kantikalyan@outlook.com, admin@alightservices.com, kantikalyan.arumilli@alightservices.com, KArumilli2020@student.hult.edu, KantiKArumilli@outlook.com and 3 more rarely used email addresses – hardly once or twice a year.

Categories
WebVeta

WebVeta production release in a week, release delayed. Full-Text Basic Tier features and advantages

In today’s digital landscape, providing an exceptional user experience is crucial for businesses to stay ahead of the competition. One key aspect of this is offering relevant and accurate search results that cater to their users’ needs. This is where WebVeta comes in – a revolutionary SaaS product that empowers businesses to deliver personalized search results like never before.

What sets WebVeta apart?

WebVeta is an easily embeddable search engine that can be integrated into any website using just a few lines of HTML. This blog post is regarding full-text search but some tiers provide advanced technology using full-text search, keyword search, and intent-based search powered by AI to provide the most relevant results. But what truly sets it apart is its ability to fetch relevant search results from high-level domains as well as sub-domains. This means that users can receive search results that are tailored for specific needs, making the overall experience more streamlined and effective.

Yesterday, I wrote a blog post about the generous free-tier: Use WebVeta’s Free Tier for Internal Search for Small Websites

Advantages of using WebVeta’s Free-Text Basic tier

So, why should you choose WebVeta over other competitors in the market? Here are just a few advantages of this innovative SaaS product:

1.  Unparalleled relevance: With WebVeta’s ability to fetch results from both high-level domains and sub-domains, users can expect highly relevant search results that cater to their specific needs.

2.  Improved user experience: By providing personalized search results, businesses can enhance the overall user experience, leading to increased engagement, retention, and ultimately, revenue.

3.  Scalability: WebVeta’s paid tiers offer flexible pricing plans that cater to the needs of growing businesses, ensuring they can scale their search functionality as needed.

WebVeta has a generous free-tier offering 3 – 4 times more free usage compared to some competitors and almost $30 – $40 compared to some other competitors.

The full-text basic tier is going to be priced at $29.99 (Introductory offer) and can be considered equivalent of $90 – $120 pricing levels of some competitors. 5 domains, sub-domains, blogs can be added. Free domain and sub-domain search. 600 pages, 60k requests per month.

Non-profits, charity organizations get twice i.e 1200 pages and 120k requests.

Anyone who wants slightly higher limits can contact me via email and I can increase the limits.

With its innovative technology and generous pricing plans, WebVeta is poised to revolutionize the way businesses approach search functionality. Stay tuned for the launch of our paid tiers in the coming week and get ready to experience the power of personalized search results like never before!

To learn more about WebVeta or to discuss how you can integrate this game-changing SaaS product into your business, please don’t hesitate to contact us via email at admin@alightservices.com. We look forward to helping you unlock the full potential of WebVeta!

Mr. Kanti Kalyan Arumilli

Arumilli Kanti Kalyan, Founder & CEO
Arumilli Kanti Kalyan, Founder & CEO

B.Tech, M.B.A

Facebook

LinkedIn

Threads

Instagram

Youtube

Founder & CEO, Lead Full-Stack .Net developer

ALight Technology And Services Limited

ALight Technologies USA Inc

Youtube

Facebook

LinkedIn

Phone / SMS / WhatsApp on the following 3 numbers:

+91-789-362-6688, +1-480-347-6849, +44-07718-273-964

+44-33-3303-1284 (Preferred number if calling from U.K, No WhatsApp)

kantikalyan@gmail.com, kantikalyan@outlook.com, admin@alightservices.com, kantikalyan.arumilli@alightservices.com, KArumilli2020@student.hult.edu, KantiKArumilli@outlook.com and 3 more rarely used email addresses – hardly once or twice a year.

Categories
WebVeta

Use WebVeta’s Free Tier for Internal Search for Small Websites

Cross Post – https://www.linkedin.com/pulse/use-webvetas-free-tier-internal-search-small-websites-arumilli-d3mqc

As small to medium-sized website owner, you understand the importance of keeping your users engaged and on your site for as long as possible. One key factor that can make or break user experience is internal search functionality. Search not only helps users find what they’re looking for quickly but also improves overall satisfaction.

That’s where WebVeta comes in – a cutting-edge, easily embeddable search engine designed to enhance the search experience on any website. Best of all, you can take advantage of our free tier, perfect for small and medium-sized websites with low traffic.

What Makes WebVeta Stand Out?

WebVeta free tier, allows full-text search and allows your users receive highly relevant results, making your website’s visitors more productive and enjoyable.

How Can Small Websites Benefit from WebVeta’s Free Tier?

WebVeta’s free tier is generous in its limitations. With the ability to embed our search engine on up to 1 website and accommodate up to 250 pages and 5000 search requests per month, it suits most small to medium-sized websites. This capacity can handle a typical traffic volume without incurring additional costs.

The same person can manage and use multiple websites. After logging in create ‘Clients’ each ‘Client’ can be on separate tiers. Some ‘Client’s on Free-Tier, some on paid-tier, some on A.I tier etc…

Non-Profits, Charitable organization, Startups please contact and I can definitely increase the limits of Free-Tier!

User Engagement: Providing an efficient and accurate internal search feature is crucial for user retention.

WebVeta’s free tier offers the perfect solution for small websites that want to enhance their user experience without breaking the bank.

Accessibility: For sites with complex navigation, our free tier can be particularly beneficial. It allows you to integrate a robust search function across your entire site without the need for additional development costs or complexities in menu-based navigation systems.

Retain Web Visitors with Internal Search:

By integrating WebVeta’s advanced search feature into your small website using our free tier, you can:

1. Enhance User Experience: Our sophisticated search capabilities will ensure that users find what they’re looking for quickly and easily, boosting their satisfaction with your site.

2. Reduce Bounce Rates: The more users are able to find what they need on your site, the less likely they are to abandon, in search of another source.

3. Increase Conversions: By retaining users on your site longer and ensuring that they find relevant content, you can increase the likelihood of conversions, whether that be through sales, sign-ups, or any other desired action.

Take Advantage of WebVeta

With WebVeta’s free tier, small websites now have access to a powerful search solution without the significant investment. Don’t miss out on this opportunity to elevate your user experience and take the first step towards retaining visitors and driving conversions. Get started with WebVeta today!

This blog post is designed to promote the adoption of WebVeta by small to medium-sized website owners, highlighting the benefits of our free tier in providing advanced search functionality without significant upfront costs.

Mr. Kanti Kalyan Arumilli

Arumilli Kanti Kalyan, Founder & CEO
Arumilli Kanti Kalyan, Founder & CEO

B.Tech, M.B.A

Facebook

LinkedIn

Threads

Instagram

Youtube

Founder & CEO, Lead Full-Stack .Net developer

ALight Technology And Services Limited

ALight Technologies USA Inc

Youtube

Facebook

LinkedIn

Phone / SMS / WhatsApp on the following 3 numbers:

+91-789-362-6688, +1-480-347-6849, +44-07718-273-964

+44-33-3303-1284 (Preferred number if calling from U.K, No WhatsApp)

kantikalyan@gmail.com, kantikalyan@outlook.com, admin@alightservices.com, kantikalyan.arumilli@alightservices.com, KArumilli2020@student.hult.edu, KantiKArumilli@outlook.com and 3 more rarely used email addresses – hardly once or twice a year.

Categories
WebVeta

How WebVeta Can Help Businesses Retain Visitors and Drive Conversions

As the digital landscape continues to evolve, it’s more crucial than ever for businesses to ensure that websites are optimized for search engine optimization (SEO) and provide a seamless user experience. However, having SEO in place is only half the battle. Even if people do find your website through Google or Bing, if they can’t easily find the information they’re looking for, they’ll click away and visit another site.

Cross Post – https://www.linkedin.com/pulse/how-webveta-can-help-businesses-retain-visitors-drive-arumilli-rlzwc

In this blog post, we’ll explore the importance of effective website search and how WebVeta’s cutting-edge technology can help businesses retain visitors and drive conversions.

When users come to your website in search of information, it’s essential that they can find what they need quickly and easily. If your internal search engine or menus are clunky, confusing, or slow, they’ll likely abandon your site and visit a competitor’s instead. This not only loses you the opportunity to engage with potential customers but also harms your business in several ways:

Lost Revenue: When visitors leave your website without converting into customers, you lose revenue and valuable long-term relationships.

Reduced Brand Credibility: Non optimized internal search can give the impression that your company is not invested in providing a great user experience, damaging your brand’s reputation.

Increased Bounce Rates: Frustrated visitors are more likely to bounce off your site and never return,

leading to higher bounce rates and reduced engagement.

How WebVeta Can Help

WebVeta is an innovative SaaS product designed to help businesses optimize their website search experience. By integrating our technology into your website, you can:

Enhance Search Functionality: With full-text search, keyword search, intent-based search, and advanced features like RAG (Retrieval Augmented Generation), WebVeta ensures that visitors can find the information they need quickly and efficiently.

Improve User Experience: By providing a seamless search experience, you’ll keep users engaged and more likely to convert into customers.

Gain a Competitive Edge: In today’s competitive digital landscape, having a cutting-edge website search engine like WebVeta can set your business apart from competitors and drive long-term growth.

Conclusion

In conclusion, effective website search is crucial for businesses seeking to retain visitors and drive conversions. By leveraging WebVeta’s innovative technology, you can optimize your website search experience, improve user engagement, and gain a competitive edge in the market.

About WebVeta

WebVeta is a cutting-edge SaaS product that provides an easily embeddable search engine for any website. Our technology utilizes AI-powered full-text search, keyword search, intent-based search, and advanced features like RAG (Retrieval Augmented Generation) to ensure visitors can find the information they need quickly and efficiently.

WebVeta currently in beta, getting ready for General Availability in 4 – 5 days.

https://webveta.alightservices.com

I am looking for digital marketing partnerships, please contact.

Get in Touch

To learn more about how WebVeta can help your business thrive, contact us today to schedule a demo or consultation.

Mr. Kanti Kalyan Arumilli

Arumilli Kanti Kalyan, Founder & CEO
Arumilli Kanti Kalyan, Founder & CEO

B.Tech, M.B.A

Facebook

LinkedIn

Threads

Instagram

Youtube

Founder & CEO, Lead Full-Stack .Net developer

ALight Technology And Services Limited

ALight Technologies USA Inc

Youtube

Facebook

LinkedIn

Phone / SMS / WhatsApp on the following 3 numbers:

+91-789-362-6688, +1-480-347-6849, +44-07718-273-964

+44-33-3303-1284 (Preferred number if calling from U.K, No WhatsApp)

kantikalyan@gmail.com, kantikalyan@outlook.com, admin@alightservices.com, kantikalyan.arumilli@alightservices.com, KArumilli2020@student.hult.edu, KantiKArumilli@outlook.com and 3 more rarely used email addresses – hardly once or twice a year.

Categories
A.I

An introduction to vector databases and having LLMs generate responses

Cross-Post: https://www.linkedin.com/pulse/introduction-vector-databases-having-llms-generate-arumilli-kg0tc

               In previous posts, I have mentioned what vector embeddings are, the need for embeddings, chunking text, some methods for chunking, and the importance of maintaining context. This post serves as an introduction to vector databases.

There are several databases that support storing and retrieving vector embeddings. Some dedicated vector databases include Chrome, QDrant, Vespa, LanceDB, etc. Others such as Solr, PostgreSQL, MongoDB Atlas (not the free community version), and Redis Cache also support vectors to varying

degrees.

Some important considerations when deciding on a vector database, similar to any other database:

1) Client SDK availability or using microservices

2) Sharding of data, replication for higher availability, and horizontal scaling

3) Ease of making backups of databases, creating snapshots, and restoring them

4) Types of vectors supported by the database

Some databases are easy to get started with but may not be suitable for production environments that require sharding, replication, and horizontal scaling.

               Once data gets chunked, stored and indexed. The next steps are retrieving appropriate documents from the databased based on the query (embeddings need to be generated for the query), and passing to LLM’s.

But in production scenarios few extra steps:

  1. Using tools such as LLM Guards to prevent malicious usage i.e well crafted query can be a prompt requesting to override the system message of an LLM – these are known as Prompt Injection.
  2. Having LLMs respond in a manner where the output can be parsed
  3. Caching of responses
  4. In some scenarios running output scanners on the response

WebVeta does this and much more and even cost controls. WebVeta can be easily integrated into any website by just using few lines of HTML!

https://webveta.alightservices.com

WebVeta running a KickStarter campaign, join the KickStarter campaign and get 25 – 40% discounts, price lock for at least 3 years.

https://www.kickstarter.com/projects/kantikalyanarumilli/webveta-power-your-website-with-ai-search

Mr. Kanti Kalyan Arumilli

Arumilli Kanti Kalyan, Founder & CEO
Arumilli Kanti Kalyan, Founder & CEO

B.Tech, M.B.A

Facebook

LinkedIn

Threads

Instagram

Youtube

Founder & CEO, Lead Full-Stack .Net developer

ALight Technology And Services Limited

ALight Technologies USA Inc

Youtube

Facebook

LinkedIn

Phone / SMS / WhatsApp on the following 3 numbers:

+91-789-362-6688, +1-480-347-6849, +44-07718-273-964

+44-33-3303-1284 (Preferred number if calling from U.K, No WhatsApp)

kantikalyan@gmail.com, kantikalyan@outlook.com, admin@alightservices.com, kantikalyan.arumilli@alightservices.com, KArumilli2020@student.hult.edu, KantiKArumilli@outlook.com and 3 more rarely used email addresses – hardly once or twice a year.

Categories
A.I

Different ways of text chunking for generating embeddings

Cross-post: https://www.linkedin.com/pulse/different-ways-text-chunking-generating-embeddings-arumilli-u8hjc

In previous posts https://www.alightservices.com/2024/10/03/an-introduction-to-text-chunking-for-purposes-of-vector-embedding/ I have talked about the reason for chunking and the concept. This post goes a little bit deeper and based on another person’ blog: Five Levels of Chunking Strategies in RAG| Notes from Greg’s Video | by Anurag Mishra | Medium and Github: RetrievalTutorials/tutorials/LevelsOfTextSplitting/5_Levels_Of_Text_Splitting.ipynb at main · FullStackRetrieval-com/RetrievalTutorials · GitHub

               For retrieval to work properly, documents need to be chunked, embeddings need to be generated and stored in vector databases. But when we chunk documents, how do we maintain context? What if a chunk’s primary information is in a different chunk but some specific information in a different chunk without context?

There are different ways of chunking.

Split by characters such as . ! ? Then combine the splitted sentences until max chunk size. Fast and easy but does not retain context.

Some people overlap some content for maintaining context.

Document based for example HTML. Chunk document by Headers. Well written HTML documents usually have context, but there is still the question of chunk size. What if a certain segment of HTML larger than max tokens of embedder?

The above methods are easy, faster and low cost.

The next set of methods are costly.

Another method is using embeddings for generating chunks. There are several approaches of using this method. Because embeddings of text and similarity of the embeddings generated determine if two texts are similar or not. Create small chunks based on sentences. Append sentences until chunk size meanwhile comparing similarity and some kind of relevancy threshold such as 0.95 and then start creating a new chunk and if needed some overlap. But this method needs lot of calls to embedders and could become costly.

Some text documents contain some information on Topic A, then discusses about Topic B and continue discussing Topic A. The previously discussed methods don’t handle this well.

This can be accomplished by either using embedders or LLMs. i.e chunk into smaller pieces and append relevant pieces of chunks. This is very costly because uses LLMs. The Github repo of https://twitter.com/GregKamradt Greg Kamradt has explained these concepts and provided code samples in Python.

WebVeta does the chunking, embedding, storing, retrieving and calling LLMs for generating responses and can be easily embedded in your websites with 2 – 3 lines of HTML! SMEs can focus on your business goals and still provide an advanced internal search engine for people who come to your website and looking for more information.

WebVeta currently running a KicksStarter campaign – https://www.kickstarter.com/projects/kantikalyanarumilli/webveta-power-your-website-with-ai-search

Or contact me for a free trial of WebVeta.

Mr. Kanti Kalyan Arumilli

Arumilli Kanti Kalyan, Founder & CEO
Arumilli Kanti Kalyan, Founder & CEO

B.Tech, M.B.A

Facebook

LinkedIn

Threads

Instagram

Youtube

Founder & CEO, Lead Full-Stack .Net developer

ALight Technology And Services Limited

ALight Technologies USA Inc

Youtube

Facebook

LinkedIn

Phone / SMS / WhatsApp on the following 3 numbers:

+91-789-362-6688, +1-480-347-6849, +44-07718-273-964

+44-33-3303-1284 (Preferred number if calling from U.K, No WhatsApp)

kantikalyan@gmail.com, kantikalyan@outlook.com, admin@alightservices.com, kantikalyan.arumilli@alightservices.com, KArumilli2020@student.hult.edu, KantiKArumilli@outlook.com and 3 more rarely used email addresses – hardly once or twice a year.

Categories
A.I

An introduction to text chunking, embeddings

Cross Post – https://www.linkedin.com/pulse/introduction-text-chunking-embeddings-kanti-kalyan-arumilli-hvifc

In my previous post, I have talked about text chunking. In this blog post let’s look at text embeddings:

  1. Text Embeddings of various output lengths.
  2. Late Interaction embeddings

Text embeddings are an array of floats. But most of the time, the meaning of a given text cannot be captured. Late Interaction embeddings are array of arrays, representing the meaning of text in different possible ways.

Once we chunk documents, we create embeddings of the text and store the chunks along with the embeddings in vector databases.

For retrieval generate embeddings of the text and query the database for the documents, vector databases retrieve documents based on how close the floats of the query are to the documents stored.

Later I would write a blog post of some vector databases. There are few different algorithms for retrieving documents but for embeddings usually using cosine similarity is a good choice.

In last blog post, I have mentioned about losing of contextual information in chunks. Certain strategies can be used for adding context into chunks and even different strategies can be used for cleaning the chunks and then generating embeddings are usually an effective way.

The next 4 posts are going to be on the topics of:  different vector databases, then retrieving and reranking of documents, some options of running LLMs, generating RAG responses.

Or if you have a website or blog and need an advanced A.I based RAG ready search engine without the hassle of the above, use WebVeta. WebVeta perfect for keyword search, full-text search and even RAG. The RAG queries, responses are cached and reduces LLM token costs.

Sign-up: https://webveta.alightservices.com/

WebVeta Blog – https://blog.alightservices.com/search/label/WebVeta

WebVeta Youtube Playlist – https://www.youtube.com/playlist?list=PLs7D8ybThnZhi9Sx17ft9_vRz5J0S_ZSZ

Mr. Kanti Kalyan Arumilli

Arumilli Kanti Kalyan, Founder & CEO
Arumilli Kanti Kalyan, Founder & CEO

B.Tech, M.B.A

Facebook

LinkedIn

Threads

Instagram

Youtube

Founder & CEO, Lead Full-Stack .Net developer

ALight Technology And Services Limited

ALight Technologies USA Inc

Youtube

Facebook

LinkedIn

Phone / SMS / WhatsApp on the following 3 numbers:

+91-789-362-6688, +1-480-347-6849, +44-07718-273-964

+44-33-3303-1284 (Preferred number if calling from U.K, No WhatsApp)

kantikalyan@gmail.com, kantikalyan@outlook.com, admin@alightservices.com, kantikalyan.arumilli@alightservices.com, KArumilli2020@student.hult.edu, KantiKArumilli@outlook.com and 3 more rarely used email addresses – hardly once or twice a year.

Categories
RAG

An introduction to text chunking for purposes of vector embedding.

Cross post – https://www.linkedin.com/pulse/introduction-text-chunking-purposes-vector-embedding-arumilli-zf6vc

In a previous post “An introduction to vector embeddings”, I have mentioned about vector embeddings and have posted a link for MTEB leaderboard. In this article I am going to discuss about “chunking”.

WebVeta does this and much much more in the A.I tier, you can sign-up, add your domains, blogs, validate ownership of your websites, few settings, copy and paste 2 – 3 lines of HTML and relax. WebVeta has a KickStarter campaign running and offering significant discounts via KickStarter.

If you see in the MTEB leaderboard, the 6th column Max tokens. Max tokens is the number of tokens accepted by the embedding model.

When using embedding models that have smaller max tokens and when we want to store large texts, we need to chunk the text.

In certain cases, even if a embedding model uses a large Max tokens, we want to look for specific information in a text document, even then we need to chunk. Let’s say I have a document that has 30,000 chunks, when I am searching for something, if I need entire document then I can use model with 32k max tokens. But what if I want just a small paragraph that has the information and if I don’t care about the rest of the document? For handling this looking for specific information type of scenario, we can chunk the documents into smaller parts and then index the chunks rather than the whole document.

For example: “Mr. Kanti Kalyan Arumilli is the founder and CEO of ALight Technology And Services Limited and ALight Technologies USA Inc” has approximately 33 chunks.

There could be other use-cases for chunking.

But chunking loses the context. In a large document, let’s say about some person, when the entire document is in context, we know from the entire text that the document is about the person. When we use chunks, some paragraph might use the term ‘he’ / ‘him’, and when the paragraph is seen in isolation, we lose the context. Too small chunks are a problem, too big chunks are not useful in certain use-cases.

Various approaches need to be considered for retaining context or adding context. There are various approaches and there is no absolute answer of what’s the best chunk size. I have seen decent results for chunks of sizes between 384 – 1024 tokens.

I am planning few more blog posts over the next few weeks regarding LLM’s RAG’s, Vector Databases, Storing, Retrieving etc…

Mr. Kanti Kalyan Arumilli

Arumilli Kanti Kalyan, Founder & CEO
Arumilli Kanti Kalyan, Founder & CEO

B.Tech, M.B.A

Facebook

LinkedIn

Threads

Instagram

Youtube

Founder & CEO, Lead Full-Stack .Net developer

ALight Technology And Services Limited

ALight Technologies USA Inc

Youtube

Facebook

LinkedIn

Phone / SMS / WhatsApp on the following 3 numbers:

+91-789-362-6688, +1-480-347-6849, +44-07718-273-964

+44-33-3303-1284 (Preferred number if calling from U.K, No WhatsApp)

kantikalyan@gmail.com, kantikalyan@outlook.com, admin@alightservices.com, kantikalyan.arumilli@alightservices.com, KArumilli2020@student.hult.edu, KantiKArumilli@outlook.com and 3 more rarely used email addresses – hardly once or twice a year.

Categories
A.I Artificial Intelligence NLP

An introduction to vector embeddings

               Have you ever wondered how machines truly understand the meaning of words? While they lack the nuanced comprehension we humans possess, they’ve learned to represent words as numerical vectors, opening a world of possibilities for tasks like search, translation, and even creative writing.

Cross Post – https://www.linkedin.com/pulse/introduction-vector-embeddings-kanti-kalyan-arumilli-wqg8c

Imagine each word in the English language as a point in a multi-dimensional space. Words with similar meanings cluster together, while those with contrasting meanings reside farther apart. This is the essence of vector embeddings: representing words as numerical vectors, capturing semantic relationships between them.

These vectors aren’t arbitrary; they’re learned through sophisticated machine learning algorithms trained on massive text datasets.

Vector embeddings revolutionize how machines process language because:

Semantic Similarity:  Words with similar meanings have vectors that are close together in the “semantic space.” This allows machines to identify synonyms, antonyms, and even subtle relationships between words.

Contextual Understanding: Capturing the nuanced meaning of a word based on its surrounding words.

Improved Performance: Embedding vectors as input to machine learning models often leads to significant performance gains in tasks like text classification, sentiment analysis, machine translation and neural search.

There are various types such as Dense, Sparse and Late Interaction. In each type there are several models trained on various datasets, fine-tuned on different datasets. The computational expenses and requirements are significantly different. Some models need high cpu, memory yet underperform and some models need lesser cpu and memory and yet perform well. However, based on the dataset and number of tokens used for generation, models trained on same datasets and higher number of tokens usually outperform models trained on same datasets and lower number of tokens.

Here is a very interesting link – https://huggingface.co/spaces/mteb/leaderboard

The above page lists several models, memory requirements, scores for various tasks, size of embeddings generated etc… Most of these models are free under MIT license and some are commercial.

In the past I have written a blog post about https://www.alightservices.com/2024/04/27/how-to-get-text-embeddings-from-meta-llama-using-c-net/ converting llama 2 / 3 into gguf and how to interact using C#.

Most of the free models mentioned in the above leaderboard have gguf and can be directly used from C# or via free HTTP local server for getting embeddings such as ollama, llama.cpp. But some models don’t have gguf, probably some can be converted or some might not. Some models have onnx format available. Some might need python code for generating embeddings. I have tried IronPython. But not suggesting IronPython or any 3rd party wrappers because of less reliability. Here is a blog post mentioning about Python integration from .Net https://www.alightservices.com/2024/04/09/c-net-python-and-nlp-natural-language-processing/

Proud partner of Microsoft for Startups
Proud partner of Microsoft for Startups

Mr. Kanti Kalyan Arumilli

Arumilli Kanti Kalyan, Founder & CEO
Arumilli Kanti Kalyan, Founder & CEO

B.Tech, M.B.A

Facebook

LinkedIn

Threads

Instagram

Youtube

Founder & CEO, Lead Full-Stack .Net developer

ALight Technology And Services Limited

ALight Technologies USA Inc

Youtube

Facebook

LinkedIn

Phone / SMS / WhatsApp on the following 3 numbers:

+91-789-362-6688, +1-480-347-6849, +44-07718-273-964

+44-33-3303-1284 (Preferred number if calling from U.K, No WhatsApp)

kantikalyan@gmail.com, kantikalyan@outlook.com, admin@alightservices.com, kantikalyan.arumilli@alightservices.com, KArumilli2020@student.hult.edu, KantiKArumilli@outlook.com and 3 more rarely used email addresses – hardly once or twice a year.

Categories
Security

Some SSH tips

This blog post is based on what I have learned over the past 1 year after my servers were hacked. This blog post has some tips based on my notes I have noted while reading various other web pages and blogs.

This is not an exhaustive list. I am NOT Linux admin but thought these tips could help some people.

In the /etc/ssh/sshd_config file:

sudo nano /etc/ssh/sshd_config

The following are some important settings:

Include /etc/ssh/sshd_config.d/*.conf

PermitRootLogin no
PasswordAuthentication no
PermitEmptyPasswords no
AllowUsers <user>
Protocol 2
DenyUsers root
MaxSessions 1

Don’t allow password login, don’t allow root, don’t allow empty password, specify users who are allowed, use version 2 of SSH protocol, unless there is a need for multiple people on same server at same time, specify max sessions.

The following can be a seperate file under /etc/ssh/sshd_config.d/ i.e the files get included.

sudo nano /etc/ssh/sshd_config.d/ssh-audit_hardening.conf

The following configuration disables weak ciphers:


Ciphers chacha20-poly1305@openssh.com,aes256-gcm@openssh.com

MACs hmac-sha2-512-etm@openssh.com

HostKeyAlgorithms sk-ssh-ed25519-cert-v01@openssh.com,ssh-ed25519-cert-v01@openssh.com,rsa-sha2-512-cert-v01@openssh.com,sk-ssh-ed25519@openssh.com,ssh-ed25519,rsa-sha2-512 

CASignatureAlgorithms sk-ssh-ed25519@openssh.com,ssh-ed25519,rsa-sha2-512,


HostbasedAcceptedAlgorithms sk-ssh-ed25519-cert-v01@openssh.com,ssh-ed25519-cert-v01@openssh.com,sk-ssh-ed25519@openssh.com,ssh-ed25519,rsa-sha2-512-cert-v01@openssh.com,rsa-sha2-512

PubkeyAcceptedAlgorithms sk-ssh-ed25519-cert-v01@openssh.com,ssh-ed25519-cert-v01@openssh.com,sk-ssh-ed25519@openssh.com,ssh-ed25519,rsa-sha2-512-cert-v01@openssh.com,rsa-sha2-512,rsa-sha2-256-cert-v01@openssh.com

Any algorithms below 256 bit are not used, 256 bit and higher are used in the above configuration.

Mr. Kanti Kalyan Arumilli

Arumilli Kanti Kalyan, Founder & CEO
Arumilli Kanti Kalyan, Founder & CEO

B.Tech, M.B.A

Facebook

LinkedIn

Threads

Instagram

Youtube

Founder & CEO, Lead Full-Stack .Net developer

ALight Technology And Services Limited

ALight Technologies USA Inc

Youtube

Facebook

LinkedIn

Phone / SMS / WhatsApp on the following 3 numbers:

+91-789-362-6688, +1-480-347-6849, +44-07718-273-964

+44-33-3303-1284 (Preferred number if calling from U.K, No WhatsApp)

kantikalyan@gmail.com, kantikalyan@outlook.com, admin@alightservices.com, kantikalyan.arumilli@alightservices.com, KArumilli2020@student.hult.edu, KantiKArumilli@outlook.com and 3 more rarely used email addresses – hardly once or twice a year.