The Future of LLM-Based Agents: Making the Boxes Bigger | Arcus
๐ Abstract
The article discusses the challenges and innovations required to make AI agents more reliable and capable of handling complex, compound tasks in real-world applications. It focuses on two key areas: long-term planning and system-level robustness.
๐ Q&A
[01] Long-term Planning
1. What are the limitations of the greedy approach to planning used in common agent frameworks today? The greedy approach, where agents determine the immediate next step to take, can be error-prone and unreliable for complex tasks. It lacks contingency planning and the ability to adapt to unforeseen events, leading to delays and getting lost, similar to a person trying to navigate NYC using a greedy approach.
2. How can decomposing a larger goal into tractable sub-goals help with long-term planning? Decomposing the overarching goal into individual sub-goals allows for a higher-level, bigger-picture plan of the steps that need to be taken. This enables the distribution of sub-goals to more specialized "executor agents" that are better equipped to solve certain types of tasks.
3. What are the main challenges in having an agent framework that handles long-term planning well? The two main challenges are: 1) Putting the long-term plans together, and 2) Enabling agents to adapt and reflect at each sub-goal to handle dynamic environments.
[02] System-level Robustness
1. What is the current limitation of LLMs in terms of tool-calling accuracy for complex tasks? The top performing model on the Berkeley Function Calling Leaderboard has an average tool-calling accuracy of 87%. As the complexity of a goal increases, requiring more tool calls, the probability of successful completion drops exponentially, preventing real-world, production-worthy performance.
2. What are the two main design patterns that can help build robustness into agent workflows? The two main design patterns are:
- Orchestrating LLM agents together to provide higher-order guarantees, determinism, and checks-and-balances
- Baking in various steps that provide guarantees over the performance of the overall system against the longer-term plan, compared to individual steps or sub-goals
3. How do these system-level approaches help in scaling the complexity of tasks that agents can solve reliably? These compound AI systems, which orchestrate LLM agents and provide various reliability checks, can help increase the complexity of tasks that agents can solve reliably as the overall system complexity grows.