Summarize by Aili

LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives

🌈 Abstract

The article explores the implications of integrating synthetic data into large language models (LLMs), examining how it can influence the models' characteristics and preferences. The key points are:

🙋 Q&A

[01] Passive Inheritance of Teacher Properties

1. How does passive inheritance impact model generation properties?

The article finds that even when the synthetic data prompts are neutral, the models are influenced in unforeseen ways, with notable changes across various attributes:
- Social bias: Models can see relative changes in overall social bias profile of up to 36%, with individual bias scores increasing by as much as 80%.
- Textual characteristics: Significant changes are observed, with mean token length increasing by over 100% in some cases, and lexical diversity increasing by up to 16%.
- Toxicity: Toxicity scores increased by up to 40% in the worst case.

2. How does passive inheritance impact model preferences when used as evaluators?

The article finds that the origin of the synthetic data directly influences the preferences of the models trained on this data:
- Models trained on synthetic data from other models inherit similar preferences, with inter-model agreement increasing by up to 13.20%.
- Models' alignment with human preferences can increase or decrease depending on whether the synthetic data comes from a stronger or weaker teacher model.
- The architecture prior of the base model outweighs the influence of the synthetic data when it comes to defining model preferences.

[02] Active Inheritance of Desirable Non-Differentiable Properties

1. How can active inheritance be used to enhance desired attributes?

The article proposes "active inheritance" as a strategy to steer generations towards desirable non-differentiable attributes by:
- Generating multiple samples per prompt from diverse models
- Selecting the sample that maximizes the desired characteristic (e.g. length, lexical diversity)
This approach is shown to effectively increase mean token length by up to 116% and lexical diversity by up to 43% compared to baseline.

2. How can active inheritance be used to mitigate negative attributes?

The article demonstrates that active inheritance can also be used to decrease undesirable characteristics, such as toxicity, by up to 40% compared to baseline.

Shared by Daniel Chen ·

Install fromChrome Web Store