A majority of enterprises don’t apply generative AI out-of-the-box because their applications need to utilize knowledge external to LLM - like their proprietary Enterprise data - and they cannot tolerate hallucinations. Since Retrieval-Augmented Generation (RAG) addresses these challenges, it has been a game-changer in the world of information retrieval. However, the traditional RAG architecture struggles with accurately matching questions to relevant information.
What many overlook is how you obtain the embeddings needed by RAG: during setup, you need to generate embeddings for your proprietary data; and during query time for the query. Vector databases typically don’t support embedding generation, they expect you to upload your data with associated embeddings. Obtaining those embeddings can become a significant challenge for you.
Therefore, many look to providers of standard LLMs, open-source or commercial, for embeddings. While you can download open-source LLMs into your own environment, you'll need to rig up a process to generate embeddings - and that can be challenging, at scale. By contrast, commercial providers like OpenAI, Anthropic, Cohere, Mistral, offer APIs for embedding generation. Typically you'll incur per token fees for that service. Both options - download open-source or use commercial APIs for embedding generation - come at increased system complexit: the LLM you downloaded need to be hosted somewhere, and with APIs the embedding generation is under someone else’s control.
This blog introduces a novel platform that provides embeddings for both structured and unstructured data, and an improved RAG system that learns to associate questions with correct answers directly from your proprietary data. We describe how Featrix makes this process simpler and more accurate than traditional approaches to RAG.
The Challenge with Traditional RAG
Conventional RAG systems typically use the same model to create embeddings for both text and questions. This approach can lead to mismatches, as the language used in questions can differ significantly from the language in the relevant text. As a result, query embeddings may not align closely with the embeddings of the most relevant text passages.
To address this problem, some advanced RAG implementations employ multiple LLMs to supervise each other, hypothesizing answers and using these to identify relevant text. While effective, this approach can be complex and computationally expensive; and accuracy likely takes a hit for use cases where the structure of the documents of interest is significantly different from the query.
Question-Answer RAG with Joint Embeddings
Instead of having multiple LLMs that generate embeddings versus possible questions, a joint embedding space over questions and answers is a more elegant solution. This approach yields a simpler, more accurate solution compared to methods using multiple supervising LLMs.
Here's how it works:
- Data Preparation:
I. Create a table with two columns: one for questions and another for the text these questions pertain to.
II. Iterate over the text of interest, chunking it into sentences or paragraphs.
III. Use an LLM (or human annotators) to generate questions about each text chunk.
IV. (Optional) Structured tabular data can be added without explicit preparation - Training the Embedding Space: Train a multimodal embedding space where regular text is one "mode" and the questions pertaining to this text are the other "mode". This creates an embedding space that directly maps questions to relevant text.
- Retrieval: Proceeds as in traditional RAG: you embed the query, and vector search finds close matches in the joint embedding space.
- Generation: Finally - like with traditional RAG - use a generative model to produce a conversational answer.
Use cases where additional structured data is useful include domains where the questions may refer to tabular data. The figure below shows the system architecture for this new variant of RAG that builds on joint embedding spaces.
Advantages of Joint Embeddings
- Higher relevance: A direct mapping between questions and relevant text improves retrieval accuracy.
- Simplicity: Unlike systems using multiple supervising LLMs, you deal with one embedding model only, more straightforward and efficient.
- Customization: The embedding space can be fine-tuned on domain-specific data, making it ideal for proprietary or specialized knowledge bases.
- Flexibility: While the focus since release of ChatGPT has been on unstructured data, a joint embedding space can - if set up properly - process structured data.
Why use Featrix?
Featrix can handle both unstructured and structured data, and lets you create joint embedding spaces - the key components of this advanced approach to RAG.
For integration with your application, the custom Featrix model is accessed through an API. The model and API can be hosted by you, or let us host it for you.
Conclusion
By leveraging Featrix to create a joint embedding space for questions and answers, you can significantly improve the accuracy and efficiency of RAG systems. This approach simplifies the architecture while providing a more direct and accurate mapping between queries and relevant information.
Whether you're working with unstructured text, structured data, or a combination of both, Featrix offers a powerful tool for creating custom embedding spaces that can revolutionize your RAG implementations. As we continue to push the boundaries of AI and information retrieval, approaches like this will be key to unlocking new possibilities in natural language understanding and generation, but also analytics.
What next?
- Learn how to apply the embedding approach to predictive analytics and exploratory data analysis in previous blogs.
- Check out our offerings on the product page
- If you’re ready to work with the SDK and API, sign up for our free trial