From Cross-Task Examples to In-Task Prompts: A Graph-Based Pseudo-Labeling Framework for In-context Learning
Zihan Chen, Song Wang, Xingbo Fu, Chengshuai Shi, Zhenyu Lei, Cong Shen, Jundong Li
TL;DR
The paper tackles the cost and reliability challenges of in-context learning on novel tasks by introducing a two-stage graph-based pipeline. GraphSim uses structure-aware embeddings and random GNNs to select transferable cross-task examples, while GLIP propagates label information from a small LLM-labeled seed set to the rest of the target data without further LLM queries. This combination yields a fully pseudo-labeled target dataset that enables high-quality in-task demonstrations with substantially reduced labeling costs. Across five target tasks and multiple LLMs, the method achieves competitive ICL performance, often approaching the in-task upper bound while significantly lowering reliance on expensive LLM labeling. The approach highlights the practical potential of integrating graph-based transfer and semi-supervised label propagation to scale ICL to new tasks in resource-constrained settings.
Abstract
The capability of in-context learning (ICL) enables large language models (LLMs) to perform novel tasks without parameter updates by conditioning on a few input-output examples. However, collecting high-quality examples for new or challenging tasks can be costly and labor-intensive. In this work, we propose a cost-efficient two-stage pipeline that reduces reliance on LLMs for data labeling. Our approach first leverages readily available cross-task examples to prompt an LLM and pseudo-label a small set of target task instances. We then introduce a graph-based label propagation method that spreads label information to the remaining target examples without additional LLM queries. The resulting fully pseudo-labeled dataset is used to construct in-task demonstrations for ICL. This pipeline combines the flexibility of cross-task supervision with the scalability of LLM-free propagation. Experiments across five tasks demonstrate that our method achieves strong performance while lowering labeling costs.
