Table of Contents
Fetching ...

From Cross-Task Examples to In-Task Prompts: A Graph-Based Pseudo-Labeling Framework for In-context Learning

Zihan Chen, Song Wang, Xingbo Fu, Chengshuai Shi, Zhenyu Lei, Cong Shen, Jundong Li

TL;DR

The paper tackles the cost and reliability challenges of in-context learning on novel tasks by introducing a two-stage graph-based pipeline. GraphSim uses structure-aware embeddings and random GNNs to select transferable cross-task examples, while GLIP propagates label information from a small LLM-labeled seed set to the rest of the target data without further LLM queries. This combination yields a fully pseudo-labeled target dataset that enables high-quality in-task demonstrations with substantially reduced labeling costs. Across five target tasks and multiple LLMs, the method achieves competitive ICL performance, often approaching the in-task upper bound while significantly lowering reliance on expensive LLM labeling. The approach highlights the practical potential of integrating graph-based transfer and semi-supervised label propagation to scale ICL to new tasks in resource-constrained settings.

Abstract

The capability of in-context learning (ICL) enables large language models (LLMs) to perform novel tasks without parameter updates by conditioning on a few input-output examples. However, collecting high-quality examples for new or challenging tasks can be costly and labor-intensive. In this work, we propose a cost-efficient two-stage pipeline that reduces reliance on LLMs for data labeling. Our approach first leverages readily available cross-task examples to prompt an LLM and pseudo-label a small set of target task instances. We then introduce a graph-based label propagation method that spreads label information to the remaining target examples without additional LLM queries. The resulting fully pseudo-labeled dataset is used to construct in-task demonstrations for ICL. This pipeline combines the flexibility of cross-task supervision with the scalability of LLM-free propagation. Experiments across five tasks demonstrate that our method achieves strong performance while lowering labeling costs.

From Cross-Task Examples to In-Task Prompts: A Graph-Based Pseudo-Labeling Framework for In-context Learning

TL;DR

The paper tackles the cost and reliability challenges of in-context learning on novel tasks by introducing a two-stage graph-based pipeline. GraphSim uses structure-aware embeddings and random GNNs to select transferable cross-task examples, while GLIP propagates label information from a small LLM-labeled seed set to the rest of the target data without further LLM queries. This combination yields a fully pseudo-labeled target dataset that enables high-quality in-task demonstrations with substantially reduced labeling costs. Across five target tasks and multiple LLMs, the method achieves competitive ICL performance, often approaching the in-task upper bound while significantly lowering reliance on expensive LLM labeling. The approach highlights the practical potential of integrating graph-based transfer and semi-supervised label propagation to scale ICL to new tasks in resource-constrained settings.

Abstract

The capability of in-context learning (ICL) enables large language models (LLMs) to perform novel tasks without parameter updates by conditioning on a few input-output examples. However, collecting high-quality examples for new or challenging tasks can be costly and labor-intensive. In this work, we propose a cost-efficient two-stage pipeline that reduces reliance on LLMs for data labeling. Our approach first leverages readily available cross-task examples to prompt an LLM and pseudo-label a small set of target task instances. We then introduce a graph-based label propagation method that spreads label information to the remaining target examples without additional LLM queries. The resulting fully pseudo-labeled dataset is used to construct in-task demonstrations for ICL. This pipeline combines the flexibility of cross-task supervision with the scalability of LLM-free propagation. Experiments across five tasks demonstrate that our method achieves strong performance while lowering labeling costs.

Paper Structure

This paper contains 25 sections, 13 equations, 4 figures, 10 tables.

Figures (4)

  • Figure 1: (Top) Wasserstein distance between source (column) and target (row) task example embeddings. (Bottom) Examples of task label spaces.
  • Figure 2: Overview of our proposed pipeline for cross-task pseudo-labeling. We first use (a) GraphSim to select relevant examples from the source task to pseudo-label a small set of target task examples ${\mathcal{D}}^L$ via ICL. Then, we apply (b) GLIP, a graph-based label propagation method, to infer labels for the remaining unlabeled target samples ${\mathcal{D}}^U$. The resulting fully pseudo-labeled target set is used to construct in-task examples for in-task ICL
  • Figure 3: Ablation Study of GraphSim Components.
  • Figure 4: Accuracy variation with respect to the number of ICL examples using LLaMA2-7B.