Learning to Filter Context for Retrieval-Augmented Generation

Abstract

On-the-fly retrieval of relevant knowledge has proven an essential element of reliable systems for tasks such as open-domain question answering and fact verification. However, because retrieval systems are not perfect, generation models are required to generate outputs given partially or entirely irrelevant passages. This can cause over- or under-reliance on context, and result in problems in the generated output such as hallucinations. To alleviate these problems, we propose FILCO, a method that improves the quality of the context provided to the generator by (1) identifying useful context based on lexical and information-theoretic approaches, and (2) training context filtering models that can filter retrieved contexts at test time. We experiment on six knowledge-intensive tasks with FLAN-T5 and LLaMa2, and demonstrate that our method outperforms existing approaches on extractive question answering (QA), complex multi-hop and long-form QA, fact verification, and dialog generation tasks. FILCO effectively improves the quality of context, whether or not it supports the canonical output.

Method

Retrieved Passages에서 답변 생성에 필요한 부분만 남기는 모델 ($M_{ctx}$)을 학습시켜 활용
답변 생성에 불필요한 부분을 제거하면 최종 성능이 향상될 것이라는 기대
- 본 연구에선 sentence-wise로 text span을 만들어서 제거

Creating Oracle

$M_{ctx}$를 학습시키기 위한 Training dataset을 만드는 과정
Qustion, Passages, Answer이 있는 dataset → passages에서 답변 생성에 도움이 되는 text span만을 남길 수 있는 방법 3가지 제안
- String Inclusion: 정답을 포함하고 있는지 여부로 판단 {0,1}
- Lexical Overlap: example({q,o}) 과 candidate text spans 사이의 unigram overlap (f1-score)이 0.5이상인 것들 중 가장 similarity가 높은 애들을 남김
- Conditional Cross-Mutual Information: $\frac{M_{gen}(o|t\oplus q)}{M_{gen}(o|q)}$

$M_{ctx}$ 학습

위에서 만든 training data로 모델 학습
- $M_{ctx}(t_{silver}| q \oplus P)$
- 이때 $t_{silver}$는 위의 방법으로 만들어진 filtered context
Test time에 $t_{pred} = M_{ctx}(q \oplus P)$ 생성

$M_{gen}$ 학습

filtered 된 context ($t_{silver}$)로 정답을 생성하게끔 학습 ⇒ test time에는 $t_{pred}$를 갖고 정답 생성하게 함
Inference Time에 filtered context를 활용하면 $\frac{|P|}{|t|}$만큼 절약

Experiment

Six Knowledge-Intensive Task에서 테스트 진행
- ODQA: Natural Questions(NQ), TriviaQA(TQA)
- Multi-Hop QA: HotPotQA
- Long-Form QA: ELI5
- Fact-Verification: FEVER
- Knowledge-Grounded Dialogue Generation: Wizard of Wikipedia(WoW)
Model
- Base Model: FLAN-T5, LLAMA2-7B

Abstract

Method

Creating Oracle

$M_{ctx}$ 학습

$M_{gen}$ 학습

Experiment

Result