Machine Unlearning in 2024
๐ Abstract
The article discusses the concept of "machine unlearning" - the process of removing the influences of training data from a trained machine learning model. It covers the motivations, forms, and evaluation of machine unlearning, with a focus on its application to large language models.
๐ Q&A
[01] A bit of history & motivations for unlearning
1. What are the two main motivations for machine unlearning discussed in the article? The article discusses two main motivations for machine unlearning:
- Access revocation: Removing private or copyrighted data that was used to train the model, in order to comply with regulations like the "right-to-be-forgotten".
- Model correction & editing: Removing undesirable content like toxicity, bias, or dangerous knowledge from the model.
2. What are some of the challenges in implementing access revocation through unlearning? Some key challenges include:
- Data trained into a model is like "consumables" that can't simply be "returned" after consumption.
- Data may be non-fungible (e.g. personal chat history) and have financial/control interests.
- Proving that unlearning has actually occurred is difficult.
3. How does the article suggest addressing the challenges of access revocation? The article suggests some potential solutions:
- Periodic retraining of the base model to batch-satisfy unlearning requests.
- Socio-technical solutions where policymakers mandate periodic retraining and set economically viable deadlines.
- Exploring data markets where data owners are properly compensated, reducing the need for unlearning requests.
[02] Forms of unlearning
1. What are the different forms of unlearning discussed in the article? The article categorizes unlearning techniques into the following forms:
- Exact unlearning
- "Unlearning" via differential privacy
- Empirical unlearning with known example space ("example unlearning")
- Empirical unlearning with unknown example space ("concept/knowledge unlearning")
- Just asking the model to pretend to unlearn
2. What are the key benefits and drawbacks of exact unlearning techniques like SISA? Benefits:
- The algorithm itself serves as the proof of unlearning.
- Turns unlearning into an accuracy/efficiency problem.
- Provides better interpretability by design.
Drawbacks:
- May not scale well to modern large models due to the need for excessive data and model sharding.
3. What are the main considerations and limitations of using differential privacy for unlearning?
- Many DP-based unlearning results only apply to convex models or losses.
- It's unclear what levels of unlearning (DP parameters) are sufficient.
- DP-like procedures may hurt model accuracy significantly.
- DP-like definitions assume equal importance of all data points, but some examples are more likely to receive unlearning requests.
- The guarantees can degrade quickly with more unlearning requests.
[03] Evaluating unlearning
1. What are the key aspects that need to be evaluated for unlearning? The article discusses three key aspects of unlearning that need to be evaluated:
- Efficiency: How fast is the unlearning algorithm compared to retraining?
- Model utility: Does unlearning harm performance on the retained data or orthogonal tasks?
- Forgetting quality: How much of the "forgotten" data is actually unlearned?
2. What are some of the challenges in evaluating the forgetting quality of unlearning? The main challenges include:
- If the forgotten examples are not clearly specified, it's unclear how to measure if they have been truly unlearned.
- Even with clearly specified forgotten examples, their generalization and entanglement with the retained knowledge makes it hard to evaluate.
- Lack of datasets and benchmarks, especially for evaluating unlearning on large language models.
3. How are recent benchmarks like TOFU and WMDP trying to address the unlearning evaluation challenges?
- TOFU focuses on unlearning individuals (e.g. book authors) by generating fake author profiles and evaluating the model's knowledge of these authors before/after unlearning.
- WMDP focuses on unlearning dangerous knowledge related to biosecurity, cybersecurity, and chemical security, using a multiple-choice question benchmark.
- These benchmarks move away from instance-level metrics and instead focus on the model's high-level knowledge retention and understanding, which is more relevant for large language models.
[04] Practice, pitfalls, and prospects of unlearning
1. What are some of the key challenges in unlearning more "fundamental" knowledge compared to "easy" data? As a piece of knowledge becomes more fundamental, it will have more associations with other knowledge, making its unlearning scope exponentially larger. This can make certain unlearning requests unsatisfiable without introducing contradictions and harming the model's utility.
2. How does the article discuss the role of unlearning in copyright protection? The article notes that while unlearning seems promising for copyright protection, there are many nuances:
- If the model's use of copyrighted content qualifies as "fair use", unlearning may not be necessary.
- There could be economic solutions like indemnification or pricing copyrighted data, rather than unlearning.
- Exact unlearning may be more promising, but legally binding auditing procedures need to be in place.
3. How does the article view unlearning in the context of AI safety? The article suggests that unlearning can be a useful post-training risk mitigation mechanism for AI safety, alongside other tools like alignment fine-tuning and content filters. It can be used to remove hazardous knowledge, biases, toxicity, or even power-seeking tendencies from models.