RoT: Enhancing Large Language Models with Reflection on Search Trees
๐ Abstract
The article introduces Reflection on search Trees (RoT), a framework designed to improve the performance of tree-search-based prompting methods for large language models (LLMs) in reasoning and planning tasks. RoT uses a strong LLM to summarize guidelines from previous tree search experiences to enhance the ability of a weak LLM, helping it avoid repeating mistakes and make better decisions during the search process.
๐ Q&A
[01] Enhancing Large Language Models with Reflection on Search Trees
1. What is the key issue that RoT aims to address?
- Tree-search-based prompting methods often make repeated mistakes, such as incorrectly evaluating actions, generating low-quality actions, and failing to predict the next state, resulting in low accuracy and poor search efficiency.
2. How does RoT work to address this issue?
- RoT employs a strong LLM to reflect on the previous search process of a weak LLM and generate task-level guidelines. These guidelines are then used to enhance the weak LLM's capability of making the right decisions and estimations during subsequent search processes.
- RoT also includes a critical information extraction mechanism to select the most crucial states from the previous search trees, as these states have a significant impact on the outcomes.
3. What are the key components of the RoT framework?
- Important state selection: Identifying the most informative states from the previous search trees, where making wise decisions can greatly improve future outcomes.
- Guideline summarization: Using a strong LLM to summarize guidelines based on the information collected from the important states, and then providing these guidelines to the weak LLM to enhance its performance.
4. How does RoT compare to other reflection methods?
- RoT outperforms the recently proposed LEAP method, especially when the problem is hard, as RoT can generate more specific guidelines by focusing on the critical states.
- RoT can also benefit non-tree-search-based prompting methods, such as Chain-of-Thought (CoT), by providing task-specific knowledge collected from the search experience.
[02] Experiments
1. What tasks were used to evaluate RoT?
- The article evaluates RoT on a variety of complex reasoning and planning tasks, including:
- Embodied planning in Blocksworld
- Mathematical reasoning in GSM8k
- Dialogue policy planning in CraigsListBargain
2. What prompting methods were used in the experiments?
- Tree-search-based prompting methods: Breadth-First Search (BFS) and Monte Carlo Tree Search (MCTS)
- Non-tree-search-based prompting methods: Chain-of-Thought (CoT) and CoT with self-consistency (CoT-SC)
3. What were the key findings from the experiments?
- RoT significantly improves the performance of various strong LLMs in the evaluated tasks when using tree-search-based prompting methods.
- RoT also outperforms the LEAP method, especially on harder tasks.
- Non-tree-search-based prompting methods, such as CoT and CoT-SC, can also benefit from the guidelines generated by RoT.
- RoT has the greatest benefit on tasks that models are not familiar with.
4. How does RoT affect the search efficiency and accuracy of tree-search-based methods?
- RoT can improve the area under the iteration-accuracy curve (AUC) of MCTS, indicating that it can achieve higher accuracy within fewer MCTS iterations, thus enhancing search efficiency and accuracy.
5. What is the impact of the important state selection mechanism in RoT?
- Selecting the most crucial states from the previous search trees and summarizing guidelines based on them allows RoT to generate more specific and meaningful guidelines, leading to better performance improvements compared to using all search experiences or random states.
</output_format>