In-Context Semi-Supervised Learning
Jiashuo Fan, Paul Rosu, Aaron T. Wang, Michael Li, Lawrence Carin, Xiang Cheng
TL;DR
The paper addresses the scarcity of labels in in-context learning by proposing in-context semi-supervised learning (IC-SSL), where a Transformer first learns a geometry-aware representation from unlabeled data and then performs in-context supervised inference with limited labels. It introduces a two-stage Transformer: a representation-learning stage that computes Laplacian-based eigenmaps and a second stage that implements gradient-descent-like in-context learning for categorical predictions, trained end-to-end but built with a mechanistic, interpretable bias. Across synthetic manifolds, product manifolds, and image manifolds (including ImageNet100), the approach yields strong low-label performance and robust out-of-distribution transfer, outperforming baselines that rely on offline Laplacian embeddings or plain ICL. The work demonstrates that Transformers can extract and exploit unlabeled geometric structure in-context, offering a principled, geometry-aware view of how attention and MLPs realize semi-supervised inference with limited supervision. These results provide a foundation for understanding and leveraging unlabeled context in scalable, cross-domain Transformer applications.
Abstract
There has been significant recent interest in understanding the capacity of Transformers for in-context learning (ICL), yet most theory focuses on supervised settings with explicitly labeled pairs. In practice, Transformers often perform well even when labels are sparse or absent, suggesting crucial structure within unlabeled contextual demonstrations. We introduce and study in-context semi-supervised learning (IC-SSL), where a small set of labeled examples is accompanied by many unlabeled points, and show that Transformers can leverage the unlabeled context to learn a robust, context-dependent representation. This representation enables accurate predictions and markedly improves performance in low-label regimes, offering foundational insights into how Transformers exploit unlabeled context for representation learning within the ICL framework.
