Table of Contents
Fetching ...

Temporal Test-Time Adaptation with State-Space Models

Mona Schirmer, Dan Zhang, Eric Nalisnick

TL;DR

The paper addresses the pervasive problem of distribution shifts that evolve over time in deployed models. It introduces STAD, a Bayesian state-space approach that tracks time-varying last-layer prototypes in the representation space to perform unsupervised test-time adaptation for TempTTA. STAD comes in two realizations, STAD-Gauss and STAD-vMF, offering a probabilistic EM-based inference framework (with Kalman-filter-like updates in Gaussian form and variational EM for hyperspherical vMF form) to update class prototypes as data streams evolve. Across multiple real-world temporal shifts and synthetic benchmarks, STAD demonstrates strong performance under label shift and small batch sizes, and shows broader applicability beyond temporal shifts, while highlighting the method’s dependence on observable last-layer changes and gradual dynamics. The work advances practical continual adaptation by providing a scalable, probabilistic mechanism to maintain performance in non-stationary environments without relying on labels.

Abstract

Distribution shifts between training and test data are inevitable over the lifecycle of a deployed model, leading to performance decay. Adapting a model on test samples can help mitigate this drop in performance. However, most test-time adaptation methods have focused on synthetic corruption shifts, leaving a variety of distribution shifts underexplored. In this paper, we focus on distribution shifts that evolve gradually over time, which are common in the wild but challenging for existing methods, as we show. To address this, we propose STAD, a Bayesian filtering method that adapts a deployed model to temporal distribution shifts by learning the time-varying dynamics in the last set of hidden features. Without requiring labels, our model infers time-evolving class prototypes that act as a dynamic classification head. Through experiments on real-world temporal distribution shifts, we show that our method excels in handling small batch sizes and label shift.

Temporal Test-Time Adaptation with State-Space Models

TL;DR

The paper addresses the pervasive problem of distribution shifts that evolve over time in deployed models. It introduces STAD, a Bayesian state-space approach that tracks time-varying last-layer prototypes in the representation space to perform unsupervised test-time adaptation for TempTTA. STAD comes in two realizations, STAD-Gauss and STAD-vMF, offering a probabilistic EM-based inference framework (with Kalman-filter-like updates in Gaussian form and variational EM for hyperspherical vMF form) to update class prototypes as data streams evolve. Across multiple real-world temporal shifts and synthetic benchmarks, STAD demonstrates strong performance under label shift and small batch sizes, and shows broader applicability beyond temporal shifts, while highlighting the method’s dependence on observable last-layer changes and gradual dynamics. The work advances practical continual adaptation by providing a scalable, probabilistic mechanism to maintain performance in non-stationary environments without relying on labels.

Abstract

Distribution shifts between training and test data are inevitable over the lifecycle of a deployed model, leading to performance decay. Adapting a model on test samples can help mitigate this drop in performance. However, most test-time adaptation methods have focused on synthetic corruption shifts, leaving a variety of distribution shifts underexplored. In this paper, we focus on distribution shifts that evolve gradually over time, which are common in the wild but challenging for existing methods, as we show. To address this, we propose STAD, a Bayesian filtering method that adapts a deployed model to temporal distribution shifts by learning the time-varying dynamics in the last set of hidden features. Without requiring labels, our model infers time-evolving class prototypes that act as a dynamic classification head. Through experiments on real-world temporal distribution shifts, we show that our method excels in handling small batch sizes and label shift.
Paper Structure (77 sections, 29 equations, 8 figures, 16 tables, 3 algorithms)

This paper contains 77 sections, 29 equations, 8 figures, 16 tables, 3 algorithms.

Figures (8)

  • Figure 1: STAD adapts to distribution shifts by inferring dynamic class prototypes ${\mathbf{w}}_{t,k}$ for each class $k$ (different colors) at each test time point. It operates on the representation space of the penultimate layer.
  • Figure 2: STAD-vMF: Representations lie on the unit sphere. STAD adapts to the distribution shift -- induced by changing demographics and styles -- by directing the last layer weights ${\mathbf{w}}_{t,k}$ towards the representations ${\mathbf{H}}_{t}$
  • Figure 3: Accuracy over time for TempTTA: STAD mitigates distribution shifts by improving up to 10 points over the source model (Yearbook, 1980s). Some baselines perform similarly, shown by overlaying accuracy trajectories.
  • Figure 4: Batch size effects under covariate shift (first row) and additional label shift (second row): STAD-vMF (dark blue) shows robustness to small batches, with a sweet spot around batch size 16 for label shift on EVIS and FMoW-Time.
  • Figure 5: Cluster fidelity on CIFAR-10-C
  • ...and 3 more figures