What Nobody Tells You About RAGs
๐ Abstract
A deep dive into the challenges and best practices of building a Retrieval Augmented Generation (RAG) system for real-world business scenarios, covering the business value, data handling, and technical optimizations required.
๐ Q&A
[01] Clarify the business value from the start: the context, the users, and the data
1. What are some key business requirements to consider before starting a RAG/LLM-based project?
- Clarify the context and understand the users' main business issues that the RAG system can help address
- Educate non-technical users on the capabilities and limitations of generative AI
- Understand the user journey and how the RAG system will integrate into existing workflows
- Anticipate the data to be indexed, qualify it, and map it to the users' needs
- Define clear success criteria and metrics to evaluate the project's ROI
[02] Understand what you're indexing
1. What are the different data modalities that can be indexed in a RAG system?
- Text data
- Images and diagrams
- Tables
- Code snippets
2. How can these multimodal data sources be combined in a RAG system?
- Text data is chunked and embedded using a text embedding model
- Tables are summarized with an LLM, and their descriptions are embedded and used for indexing
- Code snippets are chunked and embedded using a text embedding model
- Images are converted into embeddings using a multimodal vision and language model
[03] Improve chunk quality โ garbage in, garbage out
1. What are some tips for improving the quality of text chunks in a RAG system?
- Leverage document metadata like table of contents, titles, or headers to provide contextually relevant chunks
- Adjust chunk size based on the characteristics of the data (e.g., longer chunks for wordy documents, shorter chunks for bullet-point style)
- Explore semantic chunking techniques to generate chunks that are semantically relevant
[04] Improve pre-retrieval
1. What are some key pre-retrieval techniques to consider?
- Query rewriting: Use an LLM to rephrase the user's query to improve clarity and specificity
- Query expansion with Hypothetical Document Embedding (HyDE): Generate a hypothetical answer and use it to retrieve more relevant documents
- Query augmentation: Combine the original query with the preliminary generated outputs to retrieve more relevant information
[05] Improve retrieval
1. What are some techniques for improving the retrieval step in a RAG system?
- Hybrid search: Combine vector search and keyword search to leverage the advantages of both
- Filter on metadata: Use additional metadata properties to pre-filter the vector space and improve relevance
- Test multiple embedding models and fine-tune them for domain-specific data
[06] Improve post-retrieval
1. What are some post-retrieval techniques to increase the relevancy of the retrieved documents?
- Reranking: Re-order the retrieved documents based on their alignment with the query
- Remove irrelevant chunks: Use an LLM to filter out unimportant sections or chunks from the retrieved documents
[07] An overlooked part: Generation
1. What are some tips for enhancing the answer generation step in a RAG system?
- Define a system prompt to guide the LLM's behavior and writing style
- Include few-shot examples in the system prompt to provide context for complex tasks
- Force the LLM to generate structured outputs when appropriate
- Leverage techniques like Chain of Thought to improve reasoning and summarization in the generated answers