Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling

Tung Nguyen; Aditya Grover

Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling

Tung Nguyen, Aditya Grover

TL;DR

Transformer Neural Processes (TNPs) recast uncertainty-aware meta-learning as sequence modeling, using a transformer backbone and an autoregressive objective to predict target values conditioned on context. By enforcing context invariance and target equivariance and offering diagonal and non-diagonal covariance variants, TNPs achieve strong performance across meta-regression, image completion, contextual bandits, and Bayesian optimization without relying on latent-variable ELBOs. The approach yields state-of-the-art results on several benchmarks, highlights efficient covariances (diagonal or Cholesky/low-rank), and demonstrates favorable scalability and calibration properties. This work provides a unified, scalable framework for uncertainty-aware meta-learning with practical impact on sequential decision making and function-learning tasks.

Abstract

Neural Processes (NPs) are a popular class of approaches for meta-learning. Similar to Gaussian Processes (GPs), NPs define distributions over functions and can estimate uncertainty in their predictions. However, unlike GPs, NPs and their variants suffer from underfitting and often have intractable likelihoods, which limit their applications in sequential decision making. We propose Transformer Neural Processes (TNPs), a new member of the NP family that casts uncertainty-aware meta learning as a sequence modeling problem. We learn TNPs via an autoregressive likelihood-based objective and instantiate it with a novel transformer-based architecture. The model architecture respects the inductive biases inherent to the problem structure, such as invariance to the observed data points and equivariance to the unobserved points. We further investigate knobs within the TNP framework that tradeoff expressivity of the decoding distribution with extra computation. Empirically, we show that TNPs achieve state-of-the-art performance on various benchmark problems, outperforming all previous NP variants on meta regression, image completion, contextual multi-armed bandits, and Bayesian optimization.

Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling

TL;DR

Abstract

Paper Structure (50 sections, 12 equations, 19 figures, 17 tables)

This paper contains 50 sections, 12 equations, 19 figures, 17 tables.

Introduction
Background
Uncertainty-Aware Meta Learning
Neural Processes
Transformers
Transformer Neural Processes
Autoregressive Transformer Neural Process
Diagonal Transformer Neural Process
Non-Diagonal Transformer Neural Process
Experiments
1-D Regression
Image completion
Contextual bandits
Bayesian Optimization
Memory and time complexity of TNPs
...and 35 more sections

Figures (19)

Figure 1: Illustration of the TNP-A architecture. The architecture specifies a custom masking pattern between the contexts, targets, and padded targets to respect autoregressive prediction order. For TNP-D and TNP-ND, we remove the targets from the input sequence.
Figure 2: An example mask with $N=5$ and $m=2$. Each token is allowed to attend to other filled tokens on its corresponding row. The context points $(x_i, y_i)_{i=1}^2$ attend to themselves. Each target point $(x_i, y_i)$ for $i > 2$ attends to the context points and the previous target points $(x_j,y_j)_{j=3}^i$. Each padded target point $(x_i, 0)$ for $i > 0$ attends to the context points and the previous target points $(x_j,y_j)_{j=3}^{i-1}$. This ensures the prediction for $y_i$ ($i>2$) only depends on the context and the previous target points.
Figure 3: Completed images produced by the best baseline and TNPs from $100$ context points. Original images are drawn randomly from EMNIST unseen classes. For stochastic models, we sample multiple times and average the results.
Figure 4: The wheel bandit problem with varying values of $\delta$.
Figure 5: Regret performance on 1D BO tasks. For each kernel, we generate $100$ functions and report the mean and standard deviation.
...and 14 more figures

Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling

TL;DR

Abstract

Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling

Authors

TL;DR

Abstract

Table of Contents

Figures (19)