Why Google’s New AI Model is Serious Business
🌈 Abstract
The article discusses the rise of Small Language Models (SLMs) as a promising alternative to frontier AI models, which have shown signs of stagnation. It highlights the impressive performance of Google's Gemma2 model, which outperforms the original GPT-4 while being significantly smaller in size. The article argues that SLMs like Gemma2 are more relevant for enterprise adoption due to their ability to be retrained to eliminate hallucinations and achieve higher accuracy levels compared to larger models.
🙋 Q&A
[01] The Rise of Small Language Models (SLMs)
1. What are the key reasons why SLMs are gaining attention over frontier AI models?
- Frontier AI models have shown signs of stagnation, with new models being incremental improvements rather than step function upgrades.
- SLMs like Gemma2 can match the performance of larger models like GPT-4 while being significantly smaller in size.
- SLMs can be more efficiently retrained to eliminate hallucinations and achieve higher accuracy levels, which is crucial for enterprise adoption.
2. How does Gemma2 compare to other frontier AI models?
- Gemma2-27B outperforms the original GPT-4 while being 66 times smaller.
- Gemma2-9B, the smaller version, basically matches the prowess of GPT-4, making it the first sub-10 billion parameter model to do so.
- Gemma2 is now considered the best pound-for-pound language model family in the industry.
3. What are the key technical innovations in the Gemma2 model?
- Gemma2 combines two different variations of attention mechanisms (local and global) to capture both short-term dependencies and important long-range dependencies.
- It uses a technique called Grouped Quantized Attention (GQA) to reduce the memory size of the Key-Value cache, which is a significant bottleneck for inference on long sequences.
- The smaller Gemma2 models were trained using knowledge distillation, where they were taught to imitate the probability distribution of the larger teacher model, rather than the traditional next-token prediction approach.
[02] The Challenges of Frontier AI Models and the Potential of SLMs
1. What are the main challenges faced by frontier AI models like GPT-4 and Claude?
- Despite the release of newer models, the improvements have been incremental compared to GPT-4, which finished training in 2022.
- The models still struggle with hallucinations, where they generate inaccurate or irrelevant information, and achieving the high accuracy levels required for enterprise adoption (typically 95% or more).
- Retraining these large models to reduce hallucinations is prohibitively expensive, with estimates suggesting it would cost $68 million for the 70B LLaMa 3 model.
2. Why are SLMs like Gemma2 better suited for enterprise adoption?
- SLMs can be more efficiently retrained to eliminate hallucinations and achieve higher accuracy levels, making them more suitable for enterprise workloads.
- The article predicts that most enterprises will embrace SLMs much faster than frontier models due to their ability to be retrained and fine-tuned to specific tasks.
- Gemma2, in particular, is highlighted as a powerful SLM that can match the performance of larger models while being significantly smaller in size.
3. How does the licensing of the Gemma2 model impact its adoption?
- The Gemma2 models are released under a license that has been heavily criticized, with some claiming it is not even an open-source license due to the number of prohibited use cases.
- The dataset used to train the Gemma2 models was not made available, which means the model is considered "open-weights" rather than fully open-source.
- Despite the licensing concerns, the article still considers the Gemma2 model to be a top-performing SLM that is more relevant for enterprise adoption than larger frontier models.