Can AI be superhuman? Flaws in top gaming bot cast doubt
๐ Abstract
The article discusses the vulnerabilities found in one of the most successful AI systems - a bot that plays the board game Go and can beat the world's best human players. The study raises questions about whether more general AI systems will suffer from similar vulnerabilities that could compromise their safety and reliability, and even their claim to be 'superhuman'.
๐ Q&A
[01] Weaknesses in Successful AI Systems
1. What are the key findings of the research on the vulnerabilities of the Go-playing AI system?
- The research found that adversarial AI bots could regularly defeat the best open-source Go-playing AI system, KataGo, even though the adversarial bots were otherwise not very good.
- Humans could also understand the adversarial bots' tricks and adopt them to beat KataGo.
- This raises questions about whether more general AI systems will suffer from similar vulnerabilities that could compromise their safety and reliability, and even their claim to be 'superhuman'.
2. What were the three defensive strategies tested by the researchers to address the vulnerabilities in the Go-playing AI system?
- The first defense was having KataGo play against the board positions involved in the attacks, but an adversarial bot could still beat the updated version 91% of the time.
- The second strategy of iterative training against adversarial bots also did not result in an unbeatable version of KataGo, with the final adversary winning 81% of the time.
- The third strategy of building a new Go-playing AI system using a vision transformer (ViT) instead of a convolutional neural network (CNN) was also vulnerable to a new attack that helped the adversarial bot win 78% of the time.
3. What do the results suggest about the possibility of creating AI that comprehensively outpaces human capabilities?
- The results suggest that the vulnerabilities found in the Go-playing AI system will be difficult to eliminate, and if such issues exist in a simple domain like Go, there seems little prospect of patching similar issues in more complex AI systems like ChatGPT in the near-term.
- However, the most crucial takeaway is that we do not fully understand the AI systems we build today, and the implications for the possibility of creating AI that comprehensively outpaces human capabilities are less clear.
[02] Implications for AI Systems
1. What are the broader implications of the findings for AI systems, including large language models like ChatGPT?
- The key takeaway is that the vulnerabilities found in the Go-playing AI system will be difficult to eliminate, and if such issues exist in a simple domain like Go, there seems little prospect of patching similar issues in more complex AI systems like ChatGPT in the near-term.
- The results raise questions about the safety and reliability of AI systems, including their ability to behave as desired and be trusted by people.
2. How do the findings challenge the notion of 'superhuman' capabilities in AI systems?
- The findings challenge the notion of 'superhuman' capabilities in AI systems, as the adversarial bots were able to beat the expert Go-playing AI systems, and humans could also use the adversarial bots' tactics to beat these systems.
- The researchers have started using the term 'typically superhuman' to describe the strong Go AI systems, acknowledging that they are not 'superhuman in the worst cases'.