Tiny but mighty: The Phi-3 small language models with big potential
๐ Abstract
The article discusses how Microsoft researchers developed a new approach to training small language models (SLMs) that can perform many of the same capabilities as large language models (LLMs) but in a much smaller package. The key points are:
- Microsoft researchers were inspired by how children learn language to develop a training approach that uses high-quality, carefully curated data instead of just raw web data. This allowed them to create more capable SLMs.
- The new Phi-3 family of open SLMs outperform models of the same size and even larger models across various benchmarks for language, coding, and math capabilities.
- SLMs offer advantages like being able to run on devices without an internet connection, making AI more accessible in areas with limited connectivity. They are well-suited for simpler tasks, while LLMs excel at complex reasoning over large amounts of information.
- Microsoft is making the first Phi-3 model, Phi-3-mini, publicly available, with more models in the family coming soon. This allows organizations to choose the right-sized language model for their specific needs and resources.
๐ Q&A
[01] Developing Capable Small Language Models
1. What inspired Microsoft researchers to develop a new approach to training small language models?
- Microsoft researcher Ronen Eldan was inspired by how his young daughter learned language while he was reading her bedtime stories. This led him to wonder how much an AI model could learn using only words a 4-year-old could understand.
- This insight prompted Microsoft researchers to explore training small language models on carefully curated, high-quality data instead of just raw web data, in order to create more capable SLMs.
2. How did Microsoft researchers create the "TinyStories" and "CodeTextbook" datasets used to train the Phi-3 SLMs?
- For TinyStories, researchers started with a list of 3,000 common words and used a large language model to generate millions of short children's stories using combinations of those words.
- For CodeTextbook, researchers carefully selected and filtered publicly available data, such as educational materials and textbook-like content, to create a high-quality dataset for training the SLMs.
- The researchers used a sophisticated prompting and seeding approach, along with repeated filtering, to build up a large enough corpus of high-quality data to train the more capable Phi-3 SLMs.
3. What advantages do the Phi-3 SLMs offer compared to traditional large language models?
- Phi-3 SLMs can deliver many of the same capabilities as large LLMs but in a much smaller package, requiring less computing resources.
- SLMs are well-suited for simpler tasks and can run on devices without an internet connection, making AI more accessible in areas with limited connectivity.
- SLMs offer potential solutions for regulated industries that need high-quality results while keeping data on their own premises.
[02] Choosing the Right Language Model
1. How do the capabilities of small language models (SLMs) and large language models (LLMs) differ?
- LLMs excel at complex reasoning over large amounts of information due to their greater capacity and training on much larger datasets.
- SLMs are better suited for simpler tasks that don't require extensive reasoning or a quick response, such as summarizing documents or generating basic content.
- There is still a gap in intelligence between SLMs and the level that can be achieved with the largest LLMs on the cloud, but SLMs offer unique advantages for edge computing and accessibility.
2. How do Microsoft and its customers plan to use a portfolio of language models, including both SLMs and LLMs?
- Microsoft is already using suites of models, where large language models act as routers to direct certain queries that require less computing power to small language models.
- Customers may choose to "offload" some simpler tasks to SLMs, while reserving LLMs for more complex reasoning and analysis.
- The goal is not to replace LLMs with SLMs, but to have a portfolio of models that can be selected based on the specific needs and resources of the organization and the task at hand.