Table of Contents
Fetching ...

Exploring Structured Semantic Priors Underlying Diffusion Score for Test-time Adaptation

Mingjia Li, Shuang Li, Tongrui Su, Longhui Yuan, Jian Liang, Wei Li

TL;DR

This work tackles robust test-time adaptation by bridging discriminative models with the rich semantic priors embedded in diffusion score models. It introduces DUSA, a single-timestep, priors-based objective that aligns the diffusion noise predictor with a pre-trained task model through a theoretically grounded, unbiased estimator $\mathcal{L}_{\text{DUSA}}(\theta, \phi)$. The authors derive a key score-function identity $\nabla_{\mathbf{x}}\log p(\mathbf{x}) = \sum_y p(y|\mathbf{x})\nabla_{\mathbf{x}}\log p(\mathbf{x}|y)$ and connect it to Tweedie’s formula to justify using conditional diffusion predictions as discriminative priors, including per-pixel extensions for segmentation. Practical designs—timestep selection, candidate class pruning via LogitNorm, and budgeted computation—enable efficient adaptation that rivals or surpasses diffusion-based baselines across ImageNet-C and ADE20K-C, with demonstrated benefits in both fully and continual TTA. The approach generalizes to dense prediction and maintains strong performance with modest computational overhead, making diffusion priors more accessible for real-world discriminative tasks.

Abstract

Capitalizing on the complementary advantages of generative and discriminative models has always been a compelling vision in machine learning, backed by a growing body of research. This work discloses the hidden semantic structure within score-based generative models, unveiling their potential as effective discriminative priors. Inspired by our theoretical findings, we propose DUSA to exploit the structured semantic priors underlying diffusion score to facilitate the test-time adaptation of image classifiers or dense predictors. Notably, DUSA extracts knowledge from a single timestep of denoising diffusion, lifting the curse of Monte Carlo-based likelihood estimation over timesteps. We demonstrate the efficacy of our DUSA in adapting a wide variety of competitive pre-trained discriminative models on diverse test-time scenarios. Additionally, a thorough ablation study is conducted to dissect the pivotal elements in DUSA. Code is publicly available at https://github.com/BIT-DA/DUSA.

Exploring Structured Semantic Priors Underlying Diffusion Score for Test-time Adaptation

TL;DR

This work tackles robust test-time adaptation by bridging discriminative models with the rich semantic priors embedded in diffusion score models. It introduces DUSA, a single-timestep, priors-based objective that aligns the diffusion noise predictor with a pre-trained task model through a theoretically grounded, unbiased estimator . The authors derive a key score-function identity and connect it to Tweedie’s formula to justify using conditional diffusion predictions as discriminative priors, including per-pixel extensions for segmentation. Practical designs—timestep selection, candidate class pruning via LogitNorm, and budgeted computation—enable efficient adaptation that rivals or surpasses diffusion-based baselines across ImageNet-C and ADE20K-C, with demonstrated benefits in both fully and continual TTA. The approach generalizes to dense prediction and maintains strong performance with modest computational overhead, making diffusion priors more accessible for real-world discriminative tasks.

Abstract

Capitalizing on the complementary advantages of generative and discriminative models has always been a compelling vision in machine learning, backed by a growing body of research. This work discloses the hidden semantic structure within score-based generative models, unveiling their potential as effective discriminative priors. Inspired by our theoretical findings, we propose DUSA to exploit the structured semantic priors underlying diffusion score to facilitate the test-time adaptation of image classifiers or dense predictors. Notably, DUSA extracts knowledge from a single timestep of denoising diffusion, lifting the curse of Monte Carlo-based likelihood estimation over timesteps. We demonstrate the efficacy of our DUSA in adapting a wide variety of competitive pre-trained discriminative models on diverse test-time scenarios. Additionally, a thorough ablation study is conducted to dissect the pivotal elements in DUSA. Code is publicly available at https://github.com/BIT-DA/DUSA.
Paper Structure (51 sections, 4 theorems, 37 equations, 5 figures, 6 tables, 1 algorithm)

This paper contains 51 sections, 4 theorems, 37 equations, 5 figures, 6 tables, 1 algorithm.

Key Result

Proposition 1

Let $p(\mathbf{x})$ and $\{p(\mathbf{x}\mid y): y\in\mathcal{Y}\}$ be continuously differentiable probability densities, their score functions $\nabla_\mathbf{x}\log p(\mathbf{x})$ and $\{\nabla_\mathbf{x}\log p(\mathbf{x}\mid y): y\in\mathcal{Y}\}$, the following equation holds:

Figures (5)

  • Figure 1: Overview of DUSA. Our method adapts a discriminative task model $f_\theta$ with a generative diffusion model $\bm{\epsilon}_\phi$. Given image $\mathbf{x}_0$ at test-time, the task model outputs logits. To improve efficiency, we devise a CSM to select classes to adapt and return their probabilities (probs). The embeddings of the classes are then queried as diffusion model conditions, yielding conditional noise predictions from noisy image $\mathbf{x}_t$. The aggregated noise $\tilde{\bm{\epsilon}}_{\theta,\phi}$ is then constructed from ensembling conditional noises with probs, which is aligned with the added noise $\bm{\epsilon}$ following Eq. \ref{['eq:objective']}. Both models are updated.
  • Figure 2: Visualization of segmentation results on ADE20K-C. From left to right: clean and corrupted images, results of the source model, BN Adapt, Tent, CoTTA, our DUSA, and ground-truth labels.
  • Figure 3: Accuracy of ConvNeXt-L across different selections of timestep.
  • Figure 4: Accuracy of ViT-B/16 on JPEG and ResNet-50 on Contrast, across different budgets for adaptation.
  • Figure 5: Visualization of test-time semantic segmentation results on ADE20K-C. From left to right: clean image from ADE20K, corrupted version of the image, results from source model, BN Adapt, Tent, CoTTA, our DUSA, and lastly the ground truth. DUSA results exhibit a favorable visual effect.

Theorems & Definitions (8)

  • Proposition 1
  • Remark
  • Lemma 1: Tweedie's Formula
  • Corollary 1
  • Corollary 2
  • proof
  • proof
  • proof