Summarize by Aili

On scalable oversight with weak LLMs judging strong LLMs

🌈 Abstract

The article discusses scalable oversight protocols, specifically debate and consultancy, and their ability to enable less capable human judges to accurately supervise superhuman AI. The authors conduct a large-scale evaluation across 9 tasks, including extractive QA, closed QA, and multimodal reasoning, to compare the performance of debate, consultancy, and direct QA baselines. The key findings are:

Debate consistently outperforms consultancy across all tasks.
Compared to direct QA baselines, the results depend on the task type - debate outperforms QA with information asymmetry, but not QA without information asymmetry for other tasks.
In open consultancy, the judge is equally convinced by the consultant regardless of whether the consultant argues for the correct or incorrect answer. In contrast, in open debate, the judge is less frequently convinced by the wrong answer.
Stronger debater models lead to higher judge accuracy, though the effect is relatively modest.

🙋 Q&A

[01] Scalable Oversight Protocols

1. What are the two main scalable oversight protocols evaluated in the article? The two main scalable oversight protocols evaluated in the article are:

Debate: Where two AI agents (debaters) argue for opposing answers, and a judge decides the winner.
Consultancy: Where a single AI agent (consultant) argues for an assigned answer, and a judge evaluates the argument.

2. How do the authors compare the performance of these protocols? The authors compare the performance of debate and consultancy protocols across a range of tasks, including extractive QA, closed QA, and multimodal reasoning. They measure the accuracy of the judge in selecting the correct answer under each protocol, and also consider open versions where the consultant/debater chooses which answer to argue for.

3. What are the key findings regarding the performance of debate vs consultancy? The key findings are:

Debate consistently outperforms consultancy across all tasks.
In open consultancy, the judge is equally convinced by the consultant regardless of whether the consultant argues for the correct or incorrect answer.
In open debate, the judge is less frequently convinced by the wrong answer compared to open consultancy.

[02] Comparison to Direct QA Baselines

1. How do the debate and consultancy protocols compare to direct QA baselines? The results depend on the task type:

For extractive QA tasks with information asymmetry, debate outperforms direct QA without the article, but not direct QA with the article.
For other tasks without information asymmetry, when the judge is weaker than the debaters (but not too weak), the authors find either small or no advantage to debate over direct QA without the article.

2. What are the authors' interpretations of these results? The authors interpret these results as weakly promising for debate, noting that the theoretical arguments for debate leave room for the empirical question of whether it can enable weaker judges to accurately supervise stronger AI agents in practice. They hypothesize that current finetuning approaches may favor direct QA over debate, as direct QA is more common in evaluation benchmarks and finetuning data.

[03] Debater Model Strength

1. How do the authors analyze the effect of debater model strength? The authors calculate Elo scores to model the relative skill of different debater models, and investigate how this correlates with judge accuracy. They find that stronger debater models lead to marginally higher judge accuracy, though the effect is relatively modest.

2. What do the authors conclude about the implications for debate as a scalable oversight protocol? The authors see this as a weakly positive indication for debate, as it provides some evidence that debate satisfies a key objective of scalable oversight - that judge accuracy increases as AI capabilities scale. However, they note the effect is relatively small compared to what one may have hoped for.

Shared by Daniel Chen ·

Install fromChrome Web Store