magic starSummarize by Aili

Together MoA — collective intelligence of open-source models pushing the frontier of LLM capabilities

🌈 Abstract

The article introduces Mixture of Agents (MoA), a novel approach that leverages the collective strengths of multiple large language models (LLMs) to achieve superior performance compared to state-of-the-art closed-source models. MoA uses a layered architecture where each layer comprises several LLM agents that take the outputs from the previous layer as auxiliary information to generate refined responses. This allows MoA to effectively integrate diverse capabilities and insights from various models, resulting in a more robust and versatile combined model.

🙋 Q&A

[01] Mixture of Agents (MoA)

1. What is the key observation that the research is based on?

  • The research is based on the observation of the "collaborativeness of LLMs" - the phenomenon where an LLM tends to generate better responses when presented with outputs from other models, even if these other models are less capable on their own.

2. How does the MoA approach work?

  • MoA adopts a layered architecture where each layer comprises several LLM agents.
  • These agents take the outputs from the previous layer as auxiliary information to generate refined responses.
  • This approach allows MoA to effectively integrate diverse capabilities and insights from various models, resulting in a more robust and versatile combined model.

3. What are the different roles of the models in the MoA approach?

  • Proposers: These models generate initial reference responses, offering nuanced and diverse perspectives that serve as valuable references for the aggregator.
  • Aggregators: These models synthesize the different responses from the proposers into a single, high-quality response.

4. How does the layered process of MoA improve responses?

  • Several proposers independently generate responses to a given prompt.
  • These responses are then presented to aggregators in the next layer, who synthesize them into higher-quality responses.
  • This iterative process continues through several layers until a more robust and comprehensive response is achieved.

[02] Evaluation and Results

1. What are the key benchmarks used to evaluate the performance of MoA?

  • The article evaluates the performance of MoA on three standard benchmarks: AlpacaEval 2.0, MT-Bench, and FLASK.

2. What are the key results of MoA on these benchmarks?

  • On AlpacaEval 2.0, the Together MoA configuration achieved a score of 65.1%, significantly surpassing the 57.5% score of GPT-4o using only open-source models.
  • On MT-Bench, the Together MoA configuration achieved the highest average score of 9.40, outperforming other models.
  • On FLASK, the Together MoA method significantly outperformed the original Qwen1.5-110B-Chat and GPT-4o on various dimensions, such as harmlessness, robustness, correctness, efficiency, factuality, commonsense, insightfulness, and completeness.

3. How does the number of proposer models affect the performance of MoA?

  • Increasing the number of proposer models consistently improves the performance, with the Multiple-Proposer configuration outperforming the Single-Proposer configuration.
  • This highlights the value of leveraging diverse perspectives and capabilities that different models offer.

4. How does the MoA approach balance cost and performance?

  • The article presents a figure that illustrates the relationship between the LC win rate (performance) and the average inference cost for each query.
  • The Together MoA configuration is the best choice if prioritizing performance, while the Together MoA-Lite configuration can match the cost of GPT-4o while achieving higher quality.

[03] Future Directions

1. What are the potential future directions for the MoA approach?

  • Systematic optimization of the MoA architecture, exploring various choices of models, prompts, and architectural configurations.
  • Optimization of the latency of time to first token.
  • Evaluation and optimization of Together MoA for more reasoning-focused tasks to enhance its ability to tackle complex and nuanced challenges in AI.
Shared by Daniel Chen ·
© 2024 NewMotor Inc.