Summarize by Aili

Language Modeling with Editable External Knowledge

🌈 Abstract

The paper introduces ERASE, a method for updating language models by editing the external knowledge base when new documents are acquired, rather than at prediction time. This allows the knowledge base to stay up-to-date and consistent with the latest information. The paper also introduces two new benchmark datasets, CLARK-News and CLARK-Conversations, to evaluate models' ability to answer questions about a stream of evolving information.

🙋 Q&A

[01] Language Modeling with Editable External Knowledge

1. What is the key problem that the paper aims to address?

The world and the language used to describe it are constantly changing. The paper aims to develop language models and other software systems that can reflect these changes.

2. What are the key limitations of current retrieval-augmented generation (RAG) approaches?

RAG approaches sometimes retrieve stale documents that have been invalidated by new information, leading to incorrect answers.

3. How does ERASE attempt to address this limitation?

ERASE updates the knowledge base at document insertion time, by identifying related documents and deciding whether to keep, edit, or delete them. This allows new information to be propagated and prevents stale information from being used for inference.

4. What are the two new benchmark datasets introduced in the paper?

CLARK-News: a factual QA domain consisting of timestamped news articles and questions.
CLARK-Conversations: a long-conversation domain where facts about conversation participants evolve over the course of the conversation.

[02] ERASE Method

1. What are the three main steps of the ERASE method?

Retrieve facts to edit
Update retrieved facts
Add new facts

2. How does ERASE decide whether to reinforce, keep unchanged, make false, or rewrite a retrieved fact?

ERASE prompts a language model to classify each retrieved fact into one of these categories, based on whether the new document makes the fact more likely, less likely, or does not affect it. For facts classified as "make false", ERASE then prompts the LM to rewrite the fact into a true expression.

3. How does ERASE handle multi-hop edits in the conversation domain?

ERASE conditions the LM on previously reinforced facts when rewriting contradicted facts, to allow it to make multi-hop inferences about downstream implications of changes.

[03] Experiments

1. What are the key findings from the experiments on the news and conversation datasets?

ERASE outperforms standard RAG baselines and long-context models, giving 7-13% (Mixtral-8x7B) and 6-10% (Llama-3-8B) absolute improvements in accuracy on the news domain and single-hop conversation subset.
On the multi-hop conversation subset, ERASE performs comparably to baselines, suggesting room for improvement in multi-hop memory editing.

2. What are some of the limitations of the ERASE approach identified in the paper?

ERASE struggles with multi-hop updates, due to limitations in the retrieval model and the language model's ability to reason about multi-hop edits.
Allowing LMs to directly edit the knowledge base risks introducing noise, which could snowball over long timescales.

</output_format>

Shared by Daniel Chen ·

Install fromChrome Web Store