OpenAI’s New Model, Strawberry, Explained
🌈 Abstract
The article discusses a new language model called "Strawberry" developed by OpenAI, which appears to have significantly better reasoning abilities compared to current frontier models like ChatGPT. The article highlights Strawberry's ability to solve math problems and complex word puzzles that current chatbots cannot reliably do.
🙋 Q&A
[01] Strawberry: A New Language Model with Improved Reasoning Abilities
1. What are the key features of Strawberry that make it a significant advancement over current language models like ChatGPT?
- Strawberry was trained using "process supervision" rather than just "outcome supervision", meaning it was rewarded for correctly moving through each reasoning step to arrive at the answer.
- Strawberry can solve math problems and complex word puzzles (like the New York Times Connections puzzle) that current chatbots cannot reliably solve.
- Strawberry was originally developed by OpenAI to create training data for its newest foundation model, codenamed Orion, by generating a vast set of problems with step-by-step solutions.
- OpenAI plans to release a smaller, faster version of Strawberry as part of ChatGPT as soon as this fall, potentially representing a major upgrade to ChatGPT's current reasoning abilities.
2. How does Strawberry's approach to training differ from that of current language models?
- Most current language models are trained via "outcome supervision", where they are only rewarded if they get the final answer right.
- In contrast, Strawberry was trained using "process supervision", where it was rewarded for correctly moving through each reasoning step to arrive at the answer.
3. What are the potential implications of Strawberry's improved reasoning abilities for the future of language models?
- Strawberry's existence implies that the next GPT model could be significantly more powerful, as Microsoft and OpenAI have been signaling.
- If Strawberry can truly solve novel math problems and complex puzzles, it would upend the current understanding of the promise and capabilities of language models, which are typically seen as good at transmitting knowledge but less good at solving problems they haven't seen before.
[02] Limitations of Current Language Models
1. Why do current language models like ChatGPT struggle with tasks like counting the number of "r"s in the word "strawberry"?
- ChatGPT doesn't actually see the word "strawberry" as a sequence of letters, but rather as a long string of numbers representing the word in an AI-readable language.
- ChatGPT doesn't have the ability to logically count the number of letters in a word, as it makes guesses based on statistical likelihood rather than true reasoning.
2. What are the limitations of current language models in solving math problems and reasoning tasks?
- Current language models are trained via "outcome supervision", where they are only rewarded for getting the final answer right, rather than for correctly moving through the reasoning steps.
- As a result, they often hallucinate or make weird reasoning mistakes when faced with math problems and other reasoning tasks they haven't seen before.