Technical note on Sequential Test-Time Adaptation via Martingale-Driven Fisher Prompting
Behraj Khan, Tahir Qasim Syed
TL;DR
M-FISHER addresses the challenge of covariate shift in streaming vision-language models by unifying a martingale-based shift detector with Fisher-preconditioned prompt updates. The method provides time-uniform false-alarm control via Ville's inequality and links detection speed to the post-shift information gain through a bound involving $Γ$. On adaptation, prompts are updated with natural-gradient steps that minimize local KL divergence, yielding stable, parameterization-invariant updates. Empirically, M-FISHER improves both detection metrics and post-shift calibration across multiple domain-shift benchmarks, while reducing unnecessary updates through statistically grounded triggering. This work offers a principled, anytime-valid framework for robust, sequential deployment of VLMs in dynamic environments.
Abstract
We present a theoretical framework for M-FISHER, a method for sequential distribution shift detection and stable adaptation in streaming data. For detection, we construct an exponential martingale from non-conformity scores and apply Ville's inequality to obtain time-uniform guarantees on false alarm control, ensuring statistical validity at any stopping time. Under sustained shifts, we further bound the expected detection delay as $\mathcal{O}(\log(1/δ)/Γ)$, where $Γ$ reflects the post-shift information gain, thereby linking detection efficiency to distributional divergence. For adaptation, we show that Fisher-preconditioned updates of prompt parameters implement natural gradient descent on the distributional manifold, yielding locally optimal updates that minimize KL divergence while preserving stability and parameterization invariance. Together, these results establish M-FISHER as a principled approach for robust, anytime-valid detection and geometrically stable adaptation in sequential decision-making under covariate shift.
