From gen AI 1.5 to 2.0: Moving from RAG to agent systems
๐ Abstract
The article discusses the evolution of generative AI foundation models, including large language models (LLMs) and multimodal models, and the opportunities and challenges in bringing these solutions into production to create real impact. It covers the current state of "Gen AI 1.0" with LLMs and emergent behavior, the progress in "Gen AI 1.5" with retrieval-augmented generation and embedding models, and the future vision of "Gen AI 2.0" with agent-based systems that can chain multiple forms of generative AI functionality together. The article also highlights the need for optimizing the cost and performance of these solutions as they mature in production.
๐ Q&A
[01] Gen AI 1.0: LLMs and emergent behavior from next-generation tokens
1. What is the core functionality of foundation models (FMs) under the hood? FMs convert words, images, numbers, and sounds into tokens, and then predict the 'best-next-token' that is likely to make the person interacting with the model like the response. By learning from feedback for over a year, the core models have become much more in-tune with what people want out of them.
2. How has the generative AI community developed techniques to get the models to respond effectively? The community has developed "prompt-engineering" techniques, such as providing a few examples (few-shot prompt) to coach a model towards the desired answer style, or asking the model to break down the problem (chain of thought prompt) to generate more tokens and increase the likelihood of arriving at the correct answer to complex questions.
[02] Gen AI 1.5: Retrieval augmented generation, embedding models and vector databases
1. How have advances in processing capacity and retrieval-augmented generation expanded the capabilities of LLMs? State-of-the-art models can now process up to 1M tokens, enabling users to control the context with which they answer questions in ways that weren't previously possible. Additionally, the development of embedding models and vector databases has enabled similar text to be retrieved based on concepts instead of just keywords, further expanding the available information that can be leveraged by LLMs.
2. What are some of the challenges in scaling these LLM-based solutions in production? Scaling these solutions in production is a complex endeavor, requiring teams from multiple backgrounds to optimize for security, scaling, latency, cost optimization, and data/response quality, as there are no standard solutions in the space of LLM-based applications.
[03] Gen 2.0 and agent systems
1. What is the next evolution in generative AI beyond the incremental improvements in model and system performance? The next evolution is in creatively chaining multiple forms of generative AI functionality together through agent-based systems. These systems use multi-modal models in multiple ways, powered by a 'reasoning engine' (typically an LLM) that can break down problems into steps, select from a set of AI-enabled tools to execute each step, and re-think the overall solution plan.
2. What are the potential benefits and challenges of these agent-based systems? The benefits of agent-based systems include a more flexible set of solutions and the ability to tackle much more complex tasks. However, these systems can be extremely expensive to run, with thousands of LLM calls passing large numbers of tokens to the API. Therefore, parallel development in LLM optimization techniques is necessary to optimize the costs of these solutions.