A majority of enterprises don’t apply generative AI out-of-the-box because their applications need to utilize knowledge external to LLM - like their proprietary Enterprise data - and they cannot tolerate hallucinations. Since Retrieval-Augmented Generation (RAG) addresses these challenges, it has been a game-changer in the world of information retrieval. However, the traditional RAG architecture struggles with accurately matching questions to relevant information.
What many overlook is how you obtain the embeddings needed by RAG: during setup, you need to generate embeddings for your proprietary data; and during query time for the query. Vector databases typically don’t support embedding generation, they expect you to upload your data with associated embeddings. Obtaining those embeddings can become a significant challenge for you.
Therefore, many look to providers of standard LLMs, open-source or commercial, for embeddings. While you can download open-source LLMs into your own environment, you'll need to rig up a process to generate embeddings - and that can be challenging, at scale. By contrast, commercial providers like OpenAI, Anthropic, Cohere, Mistral, offer APIs for embedding generation. Typically you'll incur per token fees for that service. Both options - download open-source or use commercial APIs for embedding generation - come at increased system complexit: the LLM you downloaded need to be hosted somewhere, and with APIs the embedding generation is under someone else’s control.
This blog introduces a novel platform that provides embeddings for both structured and unstructured data, and an improved RAG system that learns to associate questions with correct answers directly from your proprietary data. We describe how Featrix makes this process simpler and more accurate than traditional approaches to RAG.
Conventional RAG systems typically use the same model to create embeddings for both text and questions. This approach can lead to mismatches, as the language used in questions can differ significantly from the language in the relevant text. As a result, query embeddings may not align closely with the embeddings of the most relevant text passages.
To address this problem, some advanced RAG implementations employ multiple LLMs to supervise each other, hypothesizing answers and using these to identify relevant text. While effective, this approach can be complex and computationally expensive; and accuracy likely takes a hit for use cases where the structure of the documents of interest is significantly different from the query.
Instead of having multiple LLMs that generate embeddings versus possible questions, a joint embedding space over questions and answers is a more elegant solution. This approach yields a simpler, more accurate solution compared to methods using multiple supervising LLMs.
Here's how it works:
Use cases where additional structured data is useful include domains where the questions may refer to tabular data. The figure below shows the system architecture for this new variant of RAG that builds on joint embedding spaces.
Featrix can handle both unstructured and structured data, and lets you create joint embedding spaces - the key components of this advanced approach to RAG.
For integration with your application, the custom Featrix model is accessed through an API. The model and API can be hosted by you, or let us host it for you.
By leveraging Featrix to create a joint embedding space for questions and answers, you can significantly improve the accuracy and efficiency of RAG systems. This approach simplifies the architecture while providing a more direct and accurate mapping between queries and relevant information.
Whether you're working with unstructured text, structured data, or a combination of both, Featrix offers a powerful tool for creating custom embedding spaces that can revolutionize your RAG implementations. As we continue to push the boundaries of AI and information retrieval, approaches like this will be key to unlocking new possibilities in natural language understanding and generation, but also analytics.