Fields that I reference when thinking about AI takeover prevention — LessWrong
🌈 Abstract
The article discusses the author's perspective on drawing analogies from other fields to inform the design and evaluation of safety measures for mitigating catastrophic risks from AI, especially the risk of AI takeover due to egregious misalignment.
🙋 Q&A
[01] Insider Threat Mitigation
1. What are the key similarities and differences between insider threat mitigation and AI control?
- The author sees insider threat mitigation as the field most closely analogous to AI control, as both involve concerns about trusted insiders (employees/AIs) potentially abusing their access to cause harm.
- Similarities include the need to develop secure processes and workflows to maintain reasonable security while allowing necessary access.
- Key differences are that insider threat mitigation is less focused on adversarial settings and does not have systematic approaches for evaluating the sufficiency of mitigation measures, unlike the author's focus on adversarial AI control.
2. What resources does the author recommend for insider threat mitigation?
- The author recommends Google's "Building Secure and Reliable Systems" as a good resource on insider threat mitigation approaches used in software development.
- The author also mentions the GiveDirectly article on falling victim to fraud as an interesting resource, though it does not contribute much in terms of specific proposals.
[02] Computer Security
1. How does the author view the relevance of computer security to AI risk?
- The author sees computer security as very directly relevant to AI risk, as it deals with adversarial problems involving computers.
- However, the author argues that the computer security mindset is not very close to the mindset needed for reasoning about preventing AI takeover.
- The key disanalogy is that computer security focuses on problems with systems composed of individually understandable parts, while AI risk involves much less understood systems with vastly fewer moving parts.
2. What are the author's thoughts on the methodologies used in computer security?
- The author notes that computer security practitioners do not have very unified methodologies for assessing system security, aside from red-teaming.
- More theoretical computer security takes an adversarial approach similar to the author's, but the overall methodology is seen as inappropriate for the AI setting where there are few mathematical guarantees.
[03] Adversarial Risk Analysis
1. How does the author view the applicability of adversarial risk analysis to AI control?
- The author sees adversarial risk analysis as the field most similar in methodology to their work on AI control, as it involves quantitative modeling of costs and risks associated with safety strategies.
- However, the author believes the case for using adversarial risk analysis is stronger for problems like allocating security resources, where the dynamics are better understood, compared to mitigating AI risk where the adversary's capabilities and strategies are less known.
2. What is the author's assessment of the use of adversarial risk analysis in practice?
- The author notes that adversarial risk analysis is a relatively obscure field and is not aware of it being used to argue that safety obligations have been met.
[04] Safety Engineering
1. How does the author view the relevance of safety engineering to AI risk?
- The author sees safety engineering as the field that takes the most methodical approach to evaluating the safety of engineered systems.
- However, the author disagrees that safety engineering is the most relevant analogy, as the problems in safety engineering often arise from unexpected interactions between system components, unlike the author's concern about adversarial AI failures.
- The author also notes that safety engineering is generally not focused on adversarial settings, unlike the author's focus on adversarial AI control.
[05] Physical Security
1. What are the author's thoughts on the relevance of physical security to AI control?
- The author sees physical security as a potentially interesting analogy, as it is an adversarial setting that is less complex than computer security.
- However, the author admits to not knowing much about physical security, except for some regulations from the US Nuclear Regulatory Commission.
- The author is unsure if physical security will provide a useful general methodology that can be adopted for AI control, given the differences in the settings.
[06] Historical Perspectives on Power Structures
1. How have the author's readings on the history of human power structures informed their views on AI takeover?
- The author has read about the history of human power structures, which has suggested some high-level takeaways that affect their views on the feasibility of one group controlling another when the situations are highly asymmetric, as well as how power structures might shift with the advent of powerful AI.
- However, the author notes that this historical reading did not provide much in the way of specific lessons for AI control techniques, as the settings are very different.