Mixture-of-Agents Beats ChatGPT-4o
๐ Abstract
The article discusses the potential of long-inference models, which are a new frontier for AI in terms of reasoning capabilities. It highlights a recent paper on Mixture-of-Agents (MoA), a collaborative framework that allows different large language models (LLMs) to combine and produce better results in complex tasks compared to individual models like GPT-4.
๐ Q&A
[01] Iteration is the Key
1. What is the definition of long-inference LLMs?
- Long-inference LLMs, instead of providing a straight answer directly, are allowed to iterate and self-reflect for a fixed or threshold-defined time to provide a more 'thoughtful' answer.
- As models are "given more time to think," they obtain much superior results to the current generation, even when the base model is inferior.
2. How do long-inference models differ from current LLMs in terms of thinking models?
- Current LLMs are System 1 thinkers, which are fast, unconscious, and intuitive.
- Long-inference models mimic System 2 thinking, which is slow, conscious, and deliberate.
- The article mentions that current models allocate the same amount of per-prediction compute independently of the task's complexity, while long-inference models aim to force the model to be 'more thoughtful' to solve complex tasks.
3. How do the Chain-of-Thought technique and long-inference frameworks compare in terms of results?
- The Chain-of-Thought technique is one way to activate System 2 thinking in current models, but it is nowhere close to the results of LLMs that actively search the solution space, as seen in long-inference frameworks.
[02] A Collaborative Effort
1. What are some examples of collaborative frameworks for LLMs mentioned in the article?
- Society of Minds, a collaboration framework for LLMs presented by MIT and Google researchers.
- Tree-of-Thoughts (ToT), a framework established by Google and Princeton University that showed LLMs can increase their results dramatically, especially in reasoning capabilities, when allowed to search the space of possible solutions.
- Chain of Preference Optimization, a recent research that uses ToT to generate good 'thought sequences' and fine-tune a standard LLM, resulting in powerful results without the need for active search.
2. How does the Mixture-of-Agents (MoA) framework work?
- MoA is conceptually similar to mixture-of-experts, but instead of breaking an LLM into parts, it creates a 'grand LLM' full of smaller ones, called agents.
- The agents, called proposers, generate possible responses to the provided prompt. The next layer of agents receives all proposed answers from the previous layer and can use this context to refine the responses.
- Finally, another LLM agent, called the aggregator, consolidates all the accumulated information and builds the final response.
[03] Many LLMs are Better than One
1. How does the MoA framework perform compared to GPT-4 on benchmarks?
- A set of open-source models, all individually inferior to GPT-4, outperforms GPT-4 on the AlpacaEval 2.0 benchmark by a margin of 7.6% absolute improvement.
- MoA is also competitive with GPT-4 on the FLASK benchmark, a more fine-grained evaluation.
2. What are the potential benefits of the MoA framework in terms of efficiency?
- Despite generating many more tokens on average, the overall solutions from the MoA framework seem more cost-efficient than frontier models, both in terms of raw cost and teraflops used per forward pass.
- The article suggests that long-inference could be a more efficient use of parameters, potentially requiring less computing than individual-but-bigger models, which could be the "best of both worlds" for the industry.