Summarize by Aili

PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers

🌈 Abstract

The paper proposes a new decision-making task called Decision QA, which requires language models to answer the best decision given a decision-making question, business rules, and a database. To address this task, the authors introduce a new retrieval-augmented generation technique called PlanRAG, which first makes a plan for the necessary data analysis, then retrieves relevant data, and iteratively re-plans if the initial plan is not sufficient. The authors also introduce a benchmark called DQA, which consists of two scenarios (Locating and Building) extracted from video games that mimic real-world business situations. Experiments show that PlanRAG significantly outperforms the state-of-the-art iterative RAG technique for the Decision QA task.

🙋 Q&A

[01] Decision QA Task

1. What is the Decision QA task?

Decision QA is a new task that requires language models to answer the best decision given a decision-making question, business rules, and a database.
The task involves three steps: (1) making a plan for the necessary data analysis, (2) retrieving relevant data from the database, and (3) making a decision based on the retrieved data.

2. What are the key components of the Decision QA task?

The input consists of a decision-making question, business rules, and a structured database (either a relational database or a labeled property graph database).
The output is the best decision that answers the given question.

3. How is Decision QA different from existing QA tasks?

Existing QA tasks focus on knowledge-based question answering, while Decision QA requires planning and data analysis for decision-making.
Decision QA involves iterative reasoning, where the language model needs to make a plan, retrieve data, and potentially re-plan based on the retrieved information.

[02] DQA Benchmark

1. What is the DQA benchmark?

DQA is a benchmark proposed by the authors for the Decision QA task.
It consists of two scenarios: Locating and Building, extracted from the video games Europa Universalis IV and Victoria 3.

2. What are the key characteristics of the DQA benchmark?

The benchmark contains 301 pairs of decision-making questions and databases, with 200 pairs for the Locating scenario and 101 pairs for the Building scenario.
The databases are provided in both relational database (RDB) and labeled property graph database (GDB) formats.
The benchmark is designed to mimic real-world business situations and decision-making problems.

3. How was the DQA benchmark constructed?

The authors extracted specific situations from the video games Europa Universalis IV and Victoria 3, which involve decision-making tasks similar to real-world business problems.
They developed game simulators to record the decision results for the 301 situations, which are used as the ground-truth annotations for the DQA benchmark.

[03] PlanRAG Technique

1. What is the PlanRAG technique?

PlanRAG is a new retrieval-augmented generation technique proposed by the authors to address the Decision QA task.
It extends the iterative RAG technique by adding a planning step, where the language model first makes a plan for the necessary data analysis before retrieving relevant data and making a decision.

2. How does PlanRAG differ from existing RAG techniques?

Existing RAG techniques focus on knowledge-based QA tasks and do not explicitly handle the planning step required for decision-making.
PlanRAG introduces a planning step, where the language model examines the data schema and question to determine the necessary data analysis before retrieving and reasoning about the data.

3. What are the key steps of the PlanRAG technique?

Planning: The language model generates an initial plan for the necessary data analysis.
Retrieving & Answering: The language model retrieves relevant data based on the plan and uses it to reason about the decision.
Re-planning: The language model assesses the current plan and generates a new plan if the initial plan is not sufficient, then repeats the retrieving and answering steps.

[04] Experimental Results

1. How did PlanRAG perform compared to other techniques on the DQA benchmark?

PlanRAG significantly outperformed the state-of-the-art iterative RAG technique, improving the accuracy by 15.8% in the Locating scenario and 7.4% in the Building scenario.
PlanRAG also outperformed the single-turn RAG and a version of PlanRAG without re-planning, demonstrating the importance of the planning and re-planning steps.

2. How did the performance of PlanRAG vary between the Locating and Building scenarios?

PlanRAG was relatively more effective in the Locating scenario than the Building scenario.
This is because the Building scenario requires a longer data analysis process, making the planning step more challenging compared to the Locating scenario.

3. What were the key factors contributing to the improved performance of PlanRAG?

PlanRAG was able to better understand the degree of difficulty of the questions and perform the necessary data retrievals more systematically compared to the iterative RAG technique.
The re-planning step in PlanRAG also helped to address cases where the initial plan was insufficient, further improving the decision-making performance.

Shared by Daniel Chen ·

Install fromChrome Web Store