Summarize by Aili

Thread by @martin_casado on Thread Reader App

https://threadreaderapp.com/thread/1829905130512400775.html?utm_source=tldrnewsletter

🌈 Abstract

The article discusses the current state of large language models (LLMs) and the challenges in achieving significant scale increases between model versions, such as from GPT-3 to GPT-4.

🙋 Q&A

[01] The state of LLMs

1. What are the key challenges in achieving a 100x scale increase between LLM versions?

Achieving a 100x scale increase between models like GPT-3 and GPT-4 is going to be very difficult.
We are nearly out of general language tokens, so the best-case scenario is a 2x increase, which could potentially be expanded to 3-4x with more proprietary tokens and data cleaning.
A 100x training run would require a Gigawatt datacenter, which is not yet available.

2. How are companies addressing the limitations in scaling LLMs?

There is a focus on getting more learnings from the same data, but no major breakthroughs have been reported.
Companies are exploring ways to push planning to inference in certain domains like coding, but it's unclear how much this can buy.
Both OpenAI and Anthropic are focusing on improving math and coding abilities through various "synthetic" compute methods, such as simulated data or recursive self-improvement.

3. What other factors are impacting the development of LLMs?

Policies like SB 1047 are threatening to slow down the progress in LLM development.

4. What is the author's overall outlook on the future of LLM scale increases?

The author does not see where the 100x jump in general language reasoning will come from, which is why the focus is shifting towards math and coding abilities.

Shared by Daniel Chen ·

Install fromChrome Web Store