A Sanity Check on ‘Emergent Properties’ in Large Language Models
🌈 Abstract
The article discusses the concept of "emergent properties" in the context of large language models (LLMs) and the lack of clarity around the term. It examines four different definitions of "emergence" used by NLP researchers and the implications of these definitions. The article also presents empirical evidence that casts doubt on the existence of true "emergent properties" in LLMs, as well as the results of a survey of NLP researchers' beliefs about emergence. Overall, the article argues that the term "emergence" is being used loosely and without a clear research agenda, which could have significant consequences for the field.
🙋 Q&A
[01] Definitions of "Emergence"
1. What are the four different definitions of "emergence" used by NLP researchers according to the article? The article identifies four main definitions of "emergence" used by NLP researchers:
- A property that a model exhibits despite not being explicitly trained for it.
- A property that the model learned from the training data (the opposite of definition 1).
- A property that is present in larger models but not in smaller models.
- A version of definition 3 where the "emergence" is characterized by its sharpness, transitioning seemingly instantaneously, and unpredictability.
2. Which definition was the most popular among the NLP researchers surveyed? According to the article, the most popular definition of "emergence" among the 220 NLP researchers and PhD students surveyed was definition 1 (a property that a model exhibits despite not being explicitly trained for it), with definition 4 being the second most popular.
3. How does the article evaluate the validity of these different definitions of "emergence"? The article argues that the definitions of "emergence" in (2), (3), and (4) do not add much to existing machine learning principles and concepts like "learning" and "scaling". For definition (1), the article suggests that proving a property is truly "emergent" (i.e., the model was not exposed to relevant training data) is very difficult, especially for commercial models with undisclosed training data.
[02] Empirical Evidence and Researcher Beliefs
1. What are some of the empirical results presented in the article that cast doubt on the existence of "emergent properties" in LLMs? The article discusses several studies that have failed to find evidence of advanced "theory of mind" or other purported "emergent abilities" in LLMs like GPT-4. It argues that the methodology of checking for direct matches in public data may miss similar cases in the training data, and that commercial models could be trained on non-public data.
2. What did the survey of NLP researchers reveal about their personal beliefs and perceptions of the field's consensus on "emergent properties"? The survey results showed that while most respondents were skeptical or unsure about LLMs having "emergent properties" themselves (only 39% agreed), 70% thought that most other NLP researchers did believe this. The article argues this is in line with other false sociological beliefs in the field, where the minority view is perceived to be the majority.
3. How did the researchers' beliefs change after the article's discussion of "emergence"? In a sample of 70 researchers who originally agreed that LLMs have "emergent properties", 83% changed their belief to either disagreeing (13.9%) or being unsure (69.4%) after the presentation of the article's arguments.
[03] Unresolved Research Questions
1. What is the key unresolved question identified by the article regarding "emergence" in LLMs? The article states that the key unresolved question is what kind of interpolation of existing patterns in the training data would even count as something "new" or "emergent" enough to qualify as a genuine phenomenon, given the complexity of natural language data.
2. Why does the article suggest that proving the existence of "emergent properties" is particularly challenging for the domain of natural language? The article argues that the natural language domain is particularly hard because it mixes different kinds of information (linguistic, social, factual, commonsense) that may be present in the training data in complex ways (explicitly, implicitly, or requiring long-range reasoning).
3. What does the article suggest is the actual research question behind claims of "emergence" in LLMs? The article suggests that the real research question may be "what data exists on the Web?" (or in proprietary training datasets), and that training LLMs is an expensive way to answer that question, rather than discovering genuinely new or "emergent" phenomena.