Feature-Adaptive and Data-Scalable In-Context Learning

Jiahao Li; Quan Wang; Licheng Zhang; Guoqing Jin; Zhendong Mao

Feature-Adaptive and Data-Scalable In-Context Learning

Jiahao Li, Quan Wang, Licheng Zhang, Guoqing Jin, Zhendong Mao

TL;DR

This work addresses the dual challenges of data scalability and task-specific adaptation in in-context learning (ICL). It introduces FADS-ICL, which decouples feature extraction from task adaptation by using the LLM as a general feature extractor and a lightweight modulator to refine features for the downstream task, supervised by beyond-context samples. Across 10 datasets and multiple model scales, FADS-ICL consistently outperforms vanilla ICL and kNN-based baselines, with substantial gains at 32 and 128 shots, and favorable computational overhead. The approach yields actionable insights into modulators, feature choices, and demonstrations, and highlights the practical potential of combining task-adaptive feature refinement with data-scalable inference in resource-constrained settings.

Abstract

In-context learning (ICL), which promotes inference with several demonstrations, has become a widespread paradigm to stimulate LLM capabilities for downstream tasks. Due to context length constraints, it cannot be further improved in spite of more training data, and general features directly from LLMs in ICL are not adaptive to the specific downstream task. In this paper, we propose a feature-adaptive and data-scalable in-context learning framework (FADS-ICL), which can leverage task-adaptive features to promote inference on the downstream task, with the supervision of beyond-context samples. Specifically, it first extracts general features of beyond-context samples via the LLM with ICL input form one by one, and introduces a task-specific modulator to perform feature refinement and prediction after fitting a specific downstream task. We conduct extensive experiments on FADS-ICL under varying data settings (4$\sim$128 shots) and LLM scale (0.8$\sim$70B) settings. Experimental results show that FADS-ICL consistently outperforms previous state-of-the-art methods by a significant margin under all settings, verifying the effectiveness and superiority of FADS-ICL. For example, under the 1.5B and 32 shots setting, FADS-ICL can achieve \textbf{+14.3} average accuracy from feature adaptation over vanilla ICL on 10 datasets, with \textbf{+6.2} average accuracy over the previous state-of-the-art method, and the performance can further improve with increasing training data. Code and data are publicly available at \url{https://github.com/jiahaozhenbang/FADS-ICL}.

Feature-Adaptive and Data-Scalable In-Context Learning

TL;DR

Abstract

128 shots) and LLM scale (0.8

70B) settings. Experimental results show that FADS-ICL consistently outperforms previous state-of-the-art methods by a significant margin under all settings, verifying the effectiveness and superiority of FADS-ICL. For example, under the 1.5B and 32 shots setting, FADS-ICL can achieve \textbf{+14.3} average accuracy from feature adaptation over vanilla ICL on 10 datasets, with \textbf{+6.2} average accuracy over the previous state-of-the-art method, and the performance can further improve with increasing training data. Code and data are publicly available at \url{https://github.com/jiahaozhenbang/FADS-ICL}.

Paper Structure (40 sections, 5 equations, 7 figures, 7 tables)

This paper contains 40 sections, 5 equations, 7 figures, 7 tables.

Introduction
Preliminary
ICL
KNN-prompting
FADS-ICL
Feature Extractor
Task-specific modulator
Experiments
Experimental Settings
Datasets
LLMs
Other Settings or Details
Evaluation
Baselines
Main Results
...and 25 more sections

Figures (7)

Figure 1: The effect of data scalability and feature adaptation on ICL. For data scalability, the performance of vanilla ICL cannot be further improved when exceeding a certain amount of data, but kNN-prompting and FADS-ICL can. For feature adaptation, FADS-ICL conducts feature refinement for specific tasks, so that it outperforms kNN-prompting using general features by large margins under all data settings.
Figure 2: The overall framework of FADS-ICL.
Figure 3: Results across LLM scales with 128-shots.
Figure 4: The comparison for computational and memory overhead on the MPQA dataset (256 test samples).
Figure 5: Left (a): The effect of different modulators in FADS-ICL. Middle (b): The effect of different features as the general features in FADS-ICL. Right (c): The role of demonstrations in FADS-ICL.
...and 2 more figures

Feature-Adaptive and Data-Scalable In-Context Learning

TL;DR

Abstract

Feature-Adaptive and Data-Scalable In-Context Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)