Summarize by Aili
LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives
๐ Abstract
The article explores the implications of integrating synthetic data into large language models (LLMs), examining how it can influence the models' characteristics and preferences. The key points are:
๐ Q&A
[01] Passive Inheritance of Teacher Properties
1. How does passive inheritance impact model generation properties?
- The article finds that even when the synthetic data prompts are neutral, the models are influenced in unforeseen ways, with notable changes across various attributes:
- Social bias: Models can see relative changes in overall social bias profile of up to 36%, with individual bias scores increasing by as much as 80%.
- Textual characteristics: Significant changes are observed, with mean token length increasing by over 100% in some cases, and lexical diversity increasing by up to 16%.
- Toxicity: Toxicity scores increased by up to 40% in the worst case.
2. How does passive inheritance impact model preferences when used as evaluators?
- The article finds that the origin of the synthetic data directly influences the preferences of the models trained on this data:
- Models trained on synthetic data from other models inherit similar preferences, with inter-model agreement increasing by up to 13.20%.
- Models' alignment with human preferences can increase or decrease depending on whether the synthetic data comes from a stronger or weaker teacher model.
- The architecture prior of the base model outweighs the influence of the synthetic data when it comes to defining model preferences.
[02] Active Inheritance of Desirable Non-Differentiable Properties
1. How can active inheritance be used to enhance desired attributes?
- The article proposes "active inheritance" as a strategy to steer generations towards desirable non-differentiable attributes by:
- Generating multiple samples per prompt from diverse models
- Selecting the sample that maximizes the desired characteristic (e.g. length, lexical diversity)
- This approach is shown to effectively increase mean token length by up to 116% and lexical diversity by up to 43% compared to baseline.
2. How can active inheritance be used to mitigate negative attributes?
- The article demonstrates that active inheritance can also be used to decrease undesirable characteristics, such as toxicity, by up to 40% compared to baseline.
Shared by Daniel Chen ยท
ยฉ 2024 NewMotor Inc.