Why Big Tech Wants to Make AI Cost Nothing
๐ Abstract
The article discusses Meta's open-sourcing and release of the Llama 3.1 large language model (LLM), which is competitive with OpenAI's ChatGPT and Anthropic's Claude. It explores the potential reasons behind Meta's decision to make Llama 3.1 freely available, including the "commoditize your complement" business strategy, the challenges of training large language models, and the potential impact on smaller AI startups.
๐ Q&A
[01] Meta's Release of Llama 3.1
1. What are the key reasons behind Meta's decision to open-source and release Llama 3.1 for free?
- The article suggests that Meta may be employing the "commoditize your complement" business strategy, where making the LLM freely available can increase demand for complementary products and services, such as cloud computing and GPU hardware.
- Another potential reason is to increase user-generated content on Meta's platforms, as users will have the ability to create AI-generated content and fine-tune pre-trained models.
- The article also suggests that Meta may not see value in having a second-place general-purpose LLM, particularly if users do not trust the company enough to rely on it for subscription-based API access.
2. What are the technical challenges involved in training large language models like Llama 3.1?
- According to the article, training Llama 3.1 on 16,000 H100 GPUs took 54 days, and Meta is expected to have the equivalent of 600,000 H100 GPUs by the end of 2024.
- This massive infrastructure allows Meta to potentially train 75 GPT-4 scale models every 90 days or about 300 such models every year, dwarfing the capabilities of smaller AI companies.
[02] Impact on the AI Ecosystem
1. How might the open-sourcing of large language models like Llama 3.1 impact smaller AI startups?
- The article suggests that the "big losers" in the commoditization of LLMs may be the current "hot and disruptive AI startups" like OpenAI, Anthropic, Character.ai, Cohere, and Mistral, as the largest tech companies start giving away their main product for free.
- This could lead to a "reckoning" for these smaller companies, as they may struggle to compete with the scale and resources of the tech giants.
2. What are the potential implications for the path towards artificial general intelligence (AGI) or artificial superintelligence (ASI)?
- The article raises the question of whether the current path of scaling ever larger multimodal transformer models will ultimately lead to AGI or ASI.
- It suggests that if the smaller companies have some modeling or R&D edge that doesn't simply involve having a massive number of GPUs, they may still have a chance to outflank the megacorps in the race towards more capable models and avenues of research.
3. How does the current AI infrastructure buildout compare to the dotcom bubble and Web 2.0 era?
- The article draws a parallel between the current AI infrastructure buildout and the infrastructure buildout that preceded the dotcom bubble and enabled the rise of Web 2.0 companies.
- It suggests that the current AI infrastructure buildout may enable breakthroughs in other areas, such as robotics, autonomous vehicles, and drug development, similar to how the earlier infrastructure buildout enabled cloud computing and streaming video.