Table of Contents
Fetching ...

Robust Neural Processes for Noisy Data

Chen Shapira, Dan Rosenbaum

TL;DR

A simple method is proposed to train NP models that makes them more robust to noisy data, and demonstrates that this method leads to models that outperform all other NP models for all noise levels.

Abstract

Models that adapt their predictions based on some given contexts, also known as in-context learning, have become ubiquitous in recent years. We propose to study the behavior of such models when data is contaminated by noise. Towards this goal we use the Neural Processes (NP) framework, as a simple and rigorous way to learn a distribution over functions, where predictions are based on a set of context points. Using this framework, we find that the models that perform best on clean data, are different than the models that perform best on noisy data. Specifically, models that process the context using attention, are more severely affected by noise, leading to in-context overfitting. We propose a simple method to train NP models that makes them more robust to noisy data. Experiments on 1D functions and 2D image datasets demonstrate that our method leads to models that outperform all other NP models for all noise levels.

Robust Neural Processes for Noisy Data

TL;DR

A simple method is proposed to train NP models that makes them more robust to noisy data, and demonstrates that this method leads to models that outperform all other NP models for all noise levels.

Abstract

Models that adapt their predictions based on some given contexts, also known as in-context learning, have become ubiquitous in recent years. We propose to study the behavior of such models when data is contaminated by noise. Towards this goal we use the Neural Processes (NP) framework, as a simple and rigorous way to learn a distribution over functions, where predictions are based on a set of context points. Using this framework, we find that the models that perform best on clean data, are different than the models that perform best on noisy data. Specifically, models that process the context using attention, are more severely affected by noise, leading to in-context overfitting. We propose a simple method to train NP models that makes them more robust to noisy data. Experiments on 1D functions and 2D image datasets demonstrate that our method leads to models that outperform all other NP models for all noise levels.

Paper Structure

This paper contains 14 sections, 4 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: Prediction accuracy for three different noise setups. Left: When training on clean data. models perform significantly worse as the context becomes more noisy at test time. The effect is extreme in attention-based models. Middle: Training with noisy contexts allows the models to learn how to filter out noise and predict only the clean signal. Right: In this more realistic setup, models trained on noisy data cannot separate the noise from the signal. Again, attention-based models are more affected by this.
  • Figure 2: Target log likelihood (computed with importance sampling) for three different datasets of 1D functions. Using our robust training method on top of an attention-based model r-anp results in better predictions for almost all noise levels, compared to all baseline models.
  • Figure 3: Examples of function predictions for models trained on Gaussian Process data with an RBF kernel. The prefix clean- is used for models trained on clean data. The effect of noisy context on models trained with clean data is severe in-context overfitting. Standard training with noisy data improves performance, but still results in overfitting for attention-based models (anp), and underfitting for context-averaging models (np). Our method demonstrates robust predictions with a better balance between adapting to context points, without deviating from the function prior. Furthermore, our method better captures the inherent global uncertainty as can be seen by the larger spread of the predicted function samples (the solid curves) vs. a smaller pointwise std (shaded area).
  • Figure 4: Left: Target log likelihood for the faces image dataset CelebA. Using our robust training method on top of a bootstrapping+attention based model r-banp results in better predictions for all noise levels, compared to all baseline models. Right: The same results showing the difference in target log-likelihood compared to the np model, emphasizing the gap for different noise levels.
  • Figure 5: Examples of using NP models on image data as 2D functions. Using a context of observed pixels, the models need to predict the value for all pixels. This is especially challenging since the models never see clean images in training. Showing two different noise levels, we see that attention-based models (banp) overfit to the context noise, while context-averaging models (np) over smooth the prediction. Interestingly, conditioning on a context with more points (bottom) can lead to worse performance, (specially evident in banp). Our method, r-banp, captures a better tradeoff between the noisy context and the prior distribution over the underlying function.
  • ...and 7 more figures