Evidence of a Log Scaling Law for Political Persuasion with Large Language Models
๐ Abstract
The article examines the persuasive capabilities of large language models (LLMs) on political issues. The key findings are:
- Evidence of a "log scaling law": LLM persuasiveness exhibits sharply diminishing returns, such that current frontier models are barely more persuasive than models an order of magnitude smaller.
- Mere "task completion" (coherence, staying on topic) appears to account for larger models' persuasive advantage. Current frontier models already achieve the highest possible score on this metric.
The article suggests an imminent ceiling on the persuasive returns to scaling the size of current transformer-based LLMs for static political messages.
๐ Q&A
[01] Introduction
1. What are the key concerns raised about the ability of large language models (LLMs) to influence human attitudes and behaviors?
- LLMs can generate compelling propaganda and disinformation, durably alter belief in conspiracy theories, draft public communications as effective as those from government agencies, and write political arguments as persuasively as human experts.
- There are concerns that the persuasiveness of near-future LLMs could continue to increase, posing threats to the information ecosystem and voter autonomy.
- Industry leaders and the machine learning community view large-scale manipulation of public opinion as a concerning and plausible risk posed by future AI models.
2. What is the key uncertainty around the scaling of LLM persuasiveness?
- The extent to which scaling the size of existing transformer-based LLMs results in more persuasive models remains unclear, as the relationship between model size and performance can vary widely by task.
- Persuasiveness can only be reliably measured by quantifying change in the attitudes of real, diverse, and dynamic human populations, rather than through static, model-only benchmarks.
3. What are the two main contributions of this study?
- Evidence of a "log scaling law" for political persuasion with LLMs, where persuasiveness exhibits sharply diminishing returns.
- Evidence that scaling LLM size appears to increase persuasiveness only to the extent that it increases "task completion" capability (coherence, staying on topic).
[02] Results
1. What were the key findings from the meta-analysis on the relationship between LLM size and persuasiveness?
- LLMs are persuasive on average, with a 5.77 percentage point change in attitudes towards the issue stance advocated.
- A one-unit increase in the log of a model's parameter count is linearly associated with a 1.26 percentage point increase in its average treatment effect.
- This implies sharply diminishing returns, such that current frontier models are barely more persuasive than models an order of magnitude smaller.
2. What did the analysis of message features and "task completion" reveal?
- Task completion score (coherence, staying on topic, arguing for the assigned stance) was the only reliable predictor of persuasiveness.
- Larger models more reliably completed the task, and adjusting for task completion rendered model size a non-significant predictor of persuasiveness.
- This suggests that mere task completion largely explains the persuasive advantage of larger models.
3. What was the observed heterogeneity in the relationship between model size and persuasiveness across different political issues?
- There was considerable variation across political issues in the average persuasive effect of LLMs, as well as in the relationship between model size and persuasiveness.
- For some issues, the average persuasive effect was relatively small with sharp diminishing returns to model size. For other issues, the average effect was larger with less diminishing returns.