Table of Contents
Fetching ...

Technical note on Sequential Test-Time Adaptation via Martingale-Driven Fisher Prompting

Behraj Khan, Tahir Qasim Syed

TL;DR

M-FISHER addresses the challenge of covariate shift in streaming vision-language models by unifying a martingale-based shift detector with Fisher-preconditioned prompt updates. The method provides time-uniform false-alarm control via Ville's inequality and links detection speed to the post-shift information gain through a bound involving $Γ$. On adaptation, prompts are updated with natural-gradient steps that minimize local KL divergence, yielding stable, parameterization-invariant updates. Empirically, M-FISHER improves both detection metrics and post-shift calibration across multiple domain-shift benchmarks, while reducing unnecessary updates through statistically grounded triggering. This work offers a principled, anytime-valid framework for robust, sequential deployment of VLMs in dynamic environments.

Abstract

We present a theoretical framework for M-FISHER, a method for sequential distribution shift detection and stable adaptation in streaming data. For detection, we construct an exponential martingale from non-conformity scores and apply Ville's inequality to obtain time-uniform guarantees on false alarm control, ensuring statistical validity at any stopping time. Under sustained shifts, we further bound the expected detection delay as $\mathcal{O}(\log(1/δ)/Γ)$, where $Γ$ reflects the post-shift information gain, thereby linking detection efficiency to distributional divergence. For adaptation, we show that Fisher-preconditioned updates of prompt parameters implement natural gradient descent on the distributional manifold, yielding locally optimal updates that minimize KL divergence while preserving stability and parameterization invariance. Together, these results establish M-FISHER as a principled approach for robust, anytime-valid detection and geometrically stable adaptation in sequential decision-making under covariate shift.

Technical note on Sequential Test-Time Adaptation via Martingale-Driven Fisher Prompting

TL;DR

M-FISHER addresses the challenge of covariate shift in streaming vision-language models by unifying a martingale-based shift detector with Fisher-preconditioned prompt updates. The method provides time-uniform false-alarm control via Ville's inequality and links detection speed to the post-shift information gain through a bound involving . On adaptation, prompts are updated with natural-gradient steps that minimize local KL divergence, yielding stable, parameterization-invariant updates. Empirically, M-FISHER improves both detection metrics and post-shift calibration across multiple domain-shift benchmarks, while reducing unnecessary updates through statistically grounded triggering. This work offers a principled, anytime-valid framework for robust, sequential deployment of VLMs in dynamic environments.

Abstract

We present a theoretical framework for M-FISHER, a method for sequential distribution shift detection and stable adaptation in streaming data. For detection, we construct an exponential martingale from non-conformity scores and apply Ville's inequality to obtain time-uniform guarantees on false alarm control, ensuring statistical validity at any stopping time. Under sustained shifts, we further bound the expected detection delay as , where reflects the post-shift information gain, thereby linking detection efficiency to distributional divergence. For adaptation, we show that Fisher-preconditioned updates of prompt parameters implement natural gradient descent on the distributional manifold, yielding locally optimal updates that minimize KL divergence while preserving stability and parameterization invariance. Together, these results establish M-FISHER as a principled approach for robust, anytime-valid detection and geometrically stable adaptation in sequential decision-making under covariate shift.

Paper Structure

This paper contains 47 sections, 4 theorems, 21 equations, 3 figures, 2 tables, 1 algorithm.

Key Result

Theorem A.1

Let $p_\theta(y|x)$ be the probabilistic model of a VLM, where $\theta = (\theta_f, P)$ includes a frozen backbone $\theta_f$ and learnable prompts $P \in \mathbb{R}^k$. Let $\mathcal{L}(P) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} [\ell(P, x)]$ be a loss function defined over the test distr For a sufficiently small step size $\eta > 0$, the natural gradient update rule where $F_P^{+}$ is

Figures (3)

  • Figure 1: Sequential shift detection and adaptation in M-FISHER. The prediction distribution (blue) is monitored via non-conformity scores. When the martingale statistic exceeds threshold $\tau$, Fisher-natural gradient updates stabilize prompt adaptation while minimizing KL divergence.
  • Figure 2: Ablation analysis on ImageNet-C sequential shift. Accuracy (gray) and Expected Calibration Error (blue) are both shown in percentage form. Exact values are displayed above each bar.
  • Figure 3: t-SNE visualization of text prompt embeddings before and after M-FISHER adaptation on Office-Home (Product $\rightarrow$ Art shift).

Theorems & Definitions (6)

  • Theorem A.1
  • Proof A.1
  • Proposition A.1: Conditional validity with bootstrap mgf correction
  • Proof A.2: Sketch
  • Theorem A.2: IID case: asymptotic detection delay
  • Proposition A.2: Dependent case: ergodic/mixing processes