Compressing Context to Enhance Inference Efficiency of Large Language Models

Abstract

Large language models (LLMs) achieved remarkable performance across various tasks. However, they face challenges in managing long documents and extended conversations, due to significantly increased computational requirements, both in memory and inference time, and potential context truncation when the input exceeds the LLM's fixed context length. This paper proposes a method called Selective Context that enhances the inference efficiency of LLMs by identifying and pruning redundancy in the input context to make the input more compact. We test our approach using common data sources requiring long context processing: arXiv papers, news articles, and long conversations, on tasks of summarisation, question answering, and response generation. Experimental results show that Selective Context significantly reduces memory cost and decreases generation latency while maintaining comparable performance compared to that achieved when full context is used. Specifically, we achieve a 50% reduction in context cost, resulting in a 36% reduction in inference memory usage and a 32% reduction in inference time, while observing only a minor drop of .023 in BERTscore and .038 in faithfulness on four downstream applications, indicating that our method strikes a good balance between efficiency and performance.

Main Idea

Self Information을 활용해 모델이 원래부터 잘 알고 있는 정보를 쳐낸다

We can calculate the self-information of a lexical unit by simply summing the selfinformation of the tokens in it.
Context Pruning Sample

Method

Computing Self Information: Token 단위로 계산하지 않고, sentence level lexical unit을 지정해서 해당 unit의 Self Information을 계산
Context Selection 계산: Thr 설정하지 않고, percentile을 정해서 압축률(?) 설정

Experiments

Dataset: BBC News, arXiv Articles, ShareGPT.com
Models
- Self Information Calculation:
- Actual Task: GPT-3.5, 4, LLaMA-7B, 13B, 30B, Vicuna-7B, 13B
Tasks
- Original Context Reconstruction
- Summarization
- Question Answering
- Conversation
Metrics
- BLEU
- METEOR
- ROUGE
- BertScore

Results

모든 모델 결과 평균