RAG is Dead. Long Live RAG! - Qdrant
๐ Abstract
The article discusses the ongoing debate around the use of Retrieval Augmented Generation (RAG) in large language models (LLMs). It argues that despite claims of LLMs becoming more accurate and not needing RAG, vector search and RAG remain crucial for enterprise-level AI systems.
๐ Q&A
[01] RAG is Dead. Long Live RAG!
1. What are the key points made about the continued importance of RAG?
- Larger context windows in LLMs are not the solution, as they require more computational resources and lead to slower processing times.
- Relying solely on LLMs for retrieval and precision is not the right approach. Vector search offers much higher precision.
- A large context window can make it harder to focus on relevant information, increasing the risk of errors or hallucinations in the LLM's responses.
- Vector search in a compound system, using a vector database like Qdrant, is superior to monolithic LLMs in terms of speed, accuracy, and cost-efficiency.
- RAG allows LLMs to pull in real-time information from up-to-date internal and external knowledge sources, making them more dynamic and adaptable.
2. What are the economic benefits of using a vector database like Qdrant in an enterprise RAG scenario?
- Running a RAG solution in an enterprise environment with petabytes of private data can be extremely costly, with each 100,000 token input costing $1 based on current GPT-4 Turbo pricing.
- Vector search queries are estimated to be at least 100 million times cheaper than queries made by LLMs.
- The only upfront investment with vector databases is the indexing, after which scaling up the reliance on vector retrieval can minimize the use of compute-heavy LLMs, making Qdrant an irreplaceable optimization measure.
[02] Vector Search in Compound Systems
1. What are the key findings about the use of compound systems in LLM applications?
- According to Zaharia et al., 60% of LLM applications use some form of RAG, while 30% use multi-step chains.
- Even Gemini 1.5, a powerful LLM, demonstrates the need for a complex strategy, as it required 32 calls to reach 90.0% accuracy on the MMLU Benchmark, showing that a basic compound arrangement is superior to monolithic models.
- Introducing vector databases like Qdrant into the design of compound systems opens up possibilities for superior applications of LLMs, as they are faster, more accurate, and much cheaper to run.
2. What is the key advantage of RAG mentioned in the article? The key advantage of RAG is that it allows an LLM to pull in real-time information from up-to-date internal and external knowledge sources, making it more dynamic and adaptable to new information.