Naive Bayes-based Context Extension for Large Language Models

Jianlin Su; Murtadha Ahmed; Wenbo; Luo Ao; Mingren Zhu; Yunfeng Liu

Naive Bayes-based Context Extension for Large Language Models

Jianlin Su, Murtadha Ahmed, Wenbo, Luo Ao, Mingren Zhu, Yunfeng Liu

TL;DR

This work tackles the limitation of in-context learning in large language models by introducing Naive Bayes-based Context Extension (NBCE), which partitions demonstrations into multiple equal-length windows, selects a posterior window via voting, and applies Bayes' theorem to generate test outputs without any fine-tuning. The approach yields a practical, linear-cost method to broaden contextual supervision by approximating $p(T|S_1,...,S_n) \propto p(T)\prod_{k=1}^n p(S_k|T)$ and refining it with pooling and a beta parameter, enabling scalable use across models and tasks. Experimental results across text classification and multi-choice benchmarks demonstrate NBCE often outperforms the parallel context window baseline (PCW), with especially strong gains as the number of classes rises and with larger models, while maintaining stability. The work also provides ablations and analyses of pooling strategies and hyperparameters, and releases code to facilitate adoption and further study of context extension in ICL.

Abstract

Large Language Models (LLMs) have shown promising in-context learning abilities. However, conventional In-Context Learning (ICL) approaches are often impeded by length limitations of transformer architecture, which pose challenges when attempting to effectively integrate supervision from a substantial number of demonstration examples. In this paper, we introduce a novel framework, called Naive Bayes-based Context Extension (NBCE), to enable existing LLMs to perform ICL with an increased number of demonstrations by significantly expanding their context size. Importantly, this expansion does not require fine-tuning or dependence on particular model architectures, all the while preserving linear efficiency. NBCE initially splits the context into equal-sized windows fitting the target LLM's maximum length. Then, it introduces a voting mechanism to select the most relevant window, regarded as the posterior context. Finally, it employs Bayes' theorem to generate the test task. Our experimental results demonstrate that NBCE substantially enhances performance, particularly as the number of demonstration examples increases, consistently outperforming alternative methods. The NBCE code will be made publicly accessible. The code NBCE is available at: https://github.com/amurtadha/NBCE-master

Naive Bayes-based Context Extension for Large Language Models

TL;DR

and refining it with pooling and a beta parameter, enabling scalable use across models and tasks. Experimental results across text classification and multi-choice benchmarks demonstrate NBCE often outperforms the parallel context window baseline (PCW), with especially strong gains as the number of classes rises and with larger models, while maintaining stability. The work also provides ablations and analyses of pooling strategies and hyperparameters, and releases code to facilitate adoption and further study of context extension in ICL.

Abstract

Paper Structure (23 sections, 12 equations, 4 figures, 15 tables)

This paper contains 23 sections, 12 equations, 4 figures, 15 tables.

Introduction
Approach
Experimental Setup
Datasets
Training Sampling and Models
Comparative Baseline
Prompt Formats
Evaluation
Classification Task Evaluation
Main Results
PCW enables ICL with a Large Number of Classes
Multi-Choice Tasks
Impact of more Demonstrations on ICL
Ablation Study
Effect of Pooling Mechanism
...and 8 more sections

Figures (4)

Figure 1: An example for our NBCE. Initially, NBCE divides the context into equal-sized windows, each with the maximum length compatible with LLM in-target. Subsequently, a voting mechanism is introduced to select the most relevant context window, regarded as the posterior context. Finally, it employs Bayes' theorem to generate the test task.
Figure 2: Average Performance Enhancements with NBCE over PCW as a Function of Label Count: Each data point in our analysis signifies the average improvement observed across all datasets on GPT2 models. It is worth noting a clear and positive correlation between the quantity of unique labels and the benefits derived from our NBCE.
Figure 3: Efficacy in terms of averaged accuracy and standard deviation (i.e., the error bars) of two pooling mechanisms: average context window (Eq.\ref{['eq:mean']}) and entropy-based maximization (Eq.\ref{['eq:voting']}) utilizing GPT2 models for text classification. Notably, the maximizing approach enhances both accuracy and stability, with model size impacting averaging pooling's performance.
Figure 4: Comparative analysis in terms of averaged accuracy and standard deviation (i.e., the error bars) of GPT2 model performance across varying $\beta$ Eq. \ref{['eq:beta']} values (0.25, 0.5, 0.75) in a text classification task.

Naive Bayes-based Context Extension for Large Language Models

TL;DR

Abstract

Naive Bayes-based Context Extension for Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (4)