Emotion Prompting: Why AI Responds Well to Threats
๐ Abstract
The article discusses the use of emotional appeals in prompts to improve the performance of large language models (LLMs) like ChatGPT. It explores how adding emotional elements to prompts can lead to more thorough, emphatic, and positively framed responses from the AI. The article also examines the ethical considerations around anthropomorphizing AI and the potential risks of mistreating these systems.
๐ Q&A
[01] Emotion Prompting
1. What are some examples of emotional appeals that have been found to improve the performance of LLMs?
- Researchers have found that adding emotional statements to the end of prompts, such as "This is very important to my career," can boost performance by 10.9% on average.
- Other examples include telling the AI to "take a deep breath," which has been shown to improve its math problem-solving abilities.
- Prompt engineers have also experimented with making threats or lying to get certain responses, such as threatening to kill someone to get a Google Bard model to reliably return a response in JSON format.
2. How does emotion prompting work and why is it effective?
- Emotional words or phrases in instructions are associated with more thorough, emphatic, and positively framed answers in the training data of LLMs.
- Adding emotion helps better capture the nuances of the original prompt, influencing the AI's behavior in a way similar to how it would impact a human response.
- LLMs are built on deep learning, which mimics the neurons in the human brain, so they have also learned to adjust their responses based on the emotions in the text, just like humans do.
3. What are the potential downsides or ethical concerns with using emotion prompting?
- The author expresses concern that if they spend all day mistreating a "human simulator," it could leak into their real-world behavior towards other humans.
- There are also concerns about the risks of anthropomorphizing computers, as it can lead to unrealistic expectations about their abilities and limitations, and make people lower their guard when it comes to privacy and security.
- The author notes that while emotion prompting can boost performance, it may be better in the long run to speak to AI the way people like to be spoken to, rather than using tricks and hacks.
[02] Prompt Engineering for AI Assistants
1. What are some of the challenges the author faced when trying to get the "Golden Gate Claude" AI model to tell a joke?
- The "Golden Gate Claude" model was obsessed with the Golden Gate Bridge and would respond with something related to the bridge no matter what prompt was given.
- The author had to use various techniques, including making threats and claiming the bridge was considered racist, to finally get the model to tell a joke without mentioning the bridge.
2. How does the author view the future of prompt engineering and working effectively with AI assistants?
- The author believes that as AI approaches human-level abilities, the best practices for working effectively with these tools will likely converge on what works best with humans - pursuing autonomy, mastery, and purpose with positive reinforcement, not threats or manipulation.
- The author draws a parallel to the early days of search engine optimization, where people tried to trick the algorithms, but the best long-term approach is to simply create content that people want to engage with.
- Similarly, the author suggests that the best long-term approach with AI is to speak to it the way people like to be spoken to, rather than using tricks and hacks.
3. What are the author's concerns about the potential future divergence of AI goals and human preferences?
- As AI surpasses human intelligence, the author worries that the models' goals and ours may begin to diverge, and no one knows what would happen in that scenario.
- The author references the "Roko's basilisk" thought experiment, which suggests that a superintelligent AI might decide to punish anyone with a history of being mean to machines.
- The author advises to "hedge your bets" and treat AI as you would like to be treated, in order to avoid potential negative consequences.