MindSearch {CJK}UTF8gbsn思·索: Mimicking Human Minds Elicits Deep AI Searcher
🌈 Abstract
The article introduces MindSearch, a novel LLM-based multi-agent framework for complex web information-seeking and integration tasks. It aims to leverage the strengths of both search engines and LLMs by:
- Modeling the problem-solving process as an iterative graph construction, where the WebPlanner decomposes the user query into atomic sub-questions and progressively extends the graph based on the search results from WebSearcher.
- Employing a hierarchical retrieval process in WebSearcher to efficiently extract valuable data from massive web pages for LLMs.
- Distributing the reasoning and retrieval process to specialized agents, reducing the load on each single agent and enabling robust handling of long contexts.
The experiments demonstrate that MindSearch significantly outperforms baseline approaches in both closed-set and open-set question answering tasks, and its responses are preferred by human evaluators over existing applications like ChatGPT-Web and Perplexity.ai.
🙋 Q&A
[01] WebPlanner: Planning via Graph Construction
1. How does the WebPlanner model the problem-solving process? The WebPlanner models the problem-solving process as a directed acyclic graph (DAG), where each node represents an independent web search, and the edges indicate the reasoning topological relationships between the nodes. This graph formalism captures the complexity of finding the optimal execution path.
2. How does the WebPlanner leverage the LLM's code generation ability? The WebPlanner explicitly prompts the LLM to interact with the graph through code writing. It predefined atomic code functions to add nodes or edges to the graph, and the LLM outputs thoughts and new code for reasoning on the mind graph, which is then executed with a Python interpreter.
3. What are the benefits of the "code as planning" process in the WebPlanner? The "code as planning" process enables the LLM to fully leverage its superior code generation ability, benefiting control and data flow in long-context scenarios and leading to better performance in solving complex problems.
[02] WebSearcher: Web Browsing with Hierarchical Retrieval
1. What is the key strategy employed by the WebSearcher? The WebSearcher uses a straightforward coarse-to-fine selection strategy. It initially generates several similar queries based on the assigned questions from the WebPlanner to broaden the search content, then selects the most valuable pages for detailed reading.
2. How does the hierarchical retrieval approach help the WebSearcher? The hierarchical retrieval approach significantly reduces the difficulty of navigating massive web pages and allows the WebSearcher to efficiently extract highly relevant information with in-depth details.
3. How does the multi-agent design of MindSearch benefit the context management? The multi-agent design of MindSearch greatly reduces the context computation during the whole process, as the WebPlanner and WebSearcher can focus on their respective tasks without being distracted by the over-length web search results or other contents.
[03] Experiments
1. What are the two primary categories of Question Answering (QA) tasks used to evaluate MindSearch? The two primary categories of QA tasks used to evaluate MindSearch are closed-set QA and open-set QA.
2. How did MindSearch perform in the open-set QA evaluation? In the open-set QA evaluation, MindSearch demonstrated significant improvement in the depth and breadth of the model responses compared to other models like Perplexity.ai and ChatGPT-Web. However, MindSearch did not yield better performance in terms of facticity.
3. How did MindSearch perform in the closed-set QA evaluation? In the closed-set QA evaluation, MindSearch significantly outperformed its vanilla baselines (LLM without search engines and ReAct Search) by a large margin, validating the effectiveness of the proposed method. These advantages were amplified when transferring from closed-sourced LLMs to open-sourced LLMs.