DeepOSets: Non-Autoregressive In-Context Learning with Permutation-Invariance Inductive Bias
Shao-Ting Chiu, Junyuan Hong, Ulisses Braga-Neto
TL;DR
This work shows that in-context learning for regression can arise in a non-autoregressive, permutation-invariant architecture called DeepOSets, by fusing set learning (DeepSets) with operator learning (DeepONets). It proves a universal representation: a permutation-invariant ICL operator $\Phi_n$ can be decomposed into a continuous encoder and a continuous decoding operator, and that DeepOSets universal-approximate such operators. Empirically, DeepOSets achieves accurate ICL on linear, shallow neural network, and polynomial regression tasks with far fewer parameters and faster training than autoregressive transformers; the Set Transformer variant (DeepOSets-T) offers higher accuracy in high-dimensional settings, at the cost of increased complexity, mitigated by inducing-point techniques (DeepOSets-TI). The results highlight efficient parallelizable ICL and potential auto-ML capabilities for in-prompt model selection, with practical implications for scalable, robust meta-learning and operator learning in regression problems.
Abstract
In-context learning (ICL) is the remarkable ability displayed by some machine learning models to learn from examples provided in a user prompt without any model parameter updates. ICL was first observed in the domain of large language models, and it has been widely assumed that it is a product of the attention mechanism in autoregressive transformers. In this paper, using stylized regression learning tasks, we demonstrate that ICL can emerge in a non-autoregressive neural architecture with a hard-coded permutation-invariance inductive bias. This novel architecture, called DeepOSets, combines the set learning properties of the DeepSets architecture with the operator learning capabilities of Deep Operator Networks (DeepONets). We provide a representation theorem for permutation-invariant regression learning operators and prove that DeepOSets are universal approximators of this class of operators. We performed comprehensive numerical experiments to evaluate the capabilities of DeepOSets in learning linear, polynomial, and shallow neural network regression, under varying noise levels, dimensionalities, and sample sizes. In the high-dimensional regime, accuracy was enhanced by replacing the DeepSets layer with a Set Transformer. Our results show that DeepOSets deliver accurate and fast results with an order of magnitude fewer parameters than a comparable transformer-based alternative.
