Table of Contents
Fetching ...

Exploring Pseudo-Token Approaches in Transformer Neural Processes

Jose Lara-Rangel, Nanze Chen, Fengzhe Zhang

TL;DR

The paper tackles the quadratic computational burden of Transformer Neural Processes by introducing Induced Set Attentive Neural Processes (ISANP) and ISANP-2, which encode the context into a fixed set of latent tokens and apply cross-attention for conditioning and querying. By leveraging pseudo-token representations and a two-stage attention mechanism, ISANPs achieve competitive performance with state-of-the-art TNPs while offering tunable computational efficiency through the latent count $L$. Empirical results across 1D meta-regression, image completion, contextual bandits, and Bayesian optimization demonstrate that ISANPs outperform LBANPs and closely match or rival TNPs, with clear scalability advantages as context sizes grow. The work thus provides a practical, scalable alternative for uncertainty-aware meta-learning in real-world settings, with future directions including higher-dimensional tasks and more latent tokens to further close the gap with full TNPs.

Abstract

Neural Processes (NPs) have gained attention in meta-learning for their ability to quantify uncertainty, together with their rapid prediction and adaptability. However, traditional NPs are prone to underfitting. Transformer Neural Processes (TNPs) significantly outperform existing NPs, yet their applicability in real-world scenarios is hindered by their quadratic computational complexity relative to both context and target data points. To address this, pseudo-token-based TNPs (PT-TNPs) have emerged as a novel NPs subset that condense context data into latent vectors or pseudo-tokens, reducing computational demands. We introduce the Induced Set Attentive Neural Processes (ISANPs), employing Induced Set Attention and an innovative query phase to improve querying efficiency. Our evaluations show that ISANPs perform competitively with TNPs and often surpass state-of-the-art models in 1D regression, image completion, contextual bandits, and Bayesian optimization. Crucially, ISANPs offer a tunable balance between performance and computational complexity, which scale well to larger datasets where TNPs face limitations.

Exploring Pseudo-Token Approaches in Transformer Neural Processes

TL;DR

The paper tackles the quadratic computational burden of Transformer Neural Processes by introducing Induced Set Attentive Neural Processes (ISANP) and ISANP-2, which encode the context into a fixed set of latent tokens and apply cross-attention for conditioning and querying. By leveraging pseudo-token representations and a two-stage attention mechanism, ISANPs achieve competitive performance with state-of-the-art TNPs while offering tunable computational efficiency through the latent count . Empirical results across 1D meta-regression, image completion, contextual bandits, and Bayesian optimization demonstrate that ISANPs outperform LBANPs and closely match or rival TNPs, with clear scalability advantages as context sizes grow. The work thus provides a practical, scalable alternative for uncertainty-aware meta-learning in real-world settings, with future directions including higher-dimensional tasks and more latent tokens to further close the gap with full TNPs.

Abstract

Neural Processes (NPs) have gained attention in meta-learning for their ability to quantify uncertainty, together with their rapid prediction and adaptability. However, traditional NPs are prone to underfitting. Transformer Neural Processes (TNPs) significantly outperform existing NPs, yet their applicability in real-world scenarios is hindered by their quadratic computational complexity relative to both context and target data points. To address this, pseudo-token-based TNPs (PT-TNPs) have emerged as a novel NPs subset that condense context data into latent vectors or pseudo-tokens, reducing computational demands. We introduce the Induced Set Attentive Neural Processes (ISANPs), employing Induced Set Attention and an innovative query phase to improve querying efficiency. Our evaluations show that ISANPs perform competitively with TNPs and often surpass state-of-the-art models in 1D regression, image completion, contextual bandits, and Bayesian optimization. Crucially, ISANPs offer a tunable balance between performance and computational complexity, which scale well to larger datasets where TNPs face limitations.

Paper Structure

This paper contains 18 sections, 3 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: LBANP architecture together with the two proposed ISANPs architectures. $CA$ stands for cross-attention and $SA$ stands for self-attention.
  • Figure 2: Model performance on CelebA64 using 8 latent vectors.
  • Figure 3: 1D Meta-Regression sample functions produced by different models.
  • Figure 4: Model performance on EMNIST using 8 latent vectors.
  • Figure 5: Visualizing NPs' initial and eventual strategies
  • ...and 4 more figures