Table of Contents
Fetching ...

ZeroSiam: An Efficient Asymmetry for Test-Time Entropy Optimization without Collapse

Guohao Chen, Shuaicheng Niu, Deyu Chen, Jiahao Yang, Zitian Zhang, Mingkui Tan, Pengcheng Wu, Zhiqi Shen

TL;DR

This paper reveals asymmetry as a key mechanism for collapse prevention and introduces ZeroSiam--an efficient asymmetric Siamese architecture tailored for test-time entropy minimization, demonstrating efficacy on both vision adaptation and large language model reasoning tasks across challenging test scenarios and diverse models, including particularly collapse-prone tiny models.

Abstract

Test-time entropy minimization helps adapt a model to novel environments and incentivize its reasoning capability, unleashing the model's potential during inference by allowing it to evolve and improve in real-time using its own predictions, achieving promising performance. However, pure entropy minimization can favor non-generalizable shortcuts, such as inflating the logit norm and driving all predictions to a dominant class to reduce entropy, risking collapsed solutions (e.g., constant one-hot outputs) that trivially minimize the objective without meaningful learning. In this paper, we reveal asymmetry as a key mechanism for collapse prevention and introduce ZeroSiam--an efficient asymmetric Siamese architecture tailored for test-time entropy minimization. ZeroSiam prevents collapse through asymmetric divergence alignment, efficiently achieved by a learnable predictor and a stop-gradient operator before the classifier. We provide empirical and theoretical evidence that ZeroSiam not only prevents collapse, but also regularizes biased learning signals, enhancing performance even when no collapse occurs. Despite its simplicity, extensive results show that ZeroSiam performs more stably over prior methods using negligible overhead, demonstrating efficacy on both vision adaptation and large language model reasoning tasks across challenging test scenarios and diverse models, including particularly collapse-prone tiny models.

ZeroSiam: An Efficient Asymmetry for Test-Time Entropy Optimization without Collapse

TL;DR

This paper reveals asymmetry as a key mechanism for collapse prevention and introduces ZeroSiam--an efficient asymmetric Siamese architecture tailored for test-time entropy minimization, demonstrating efficacy on both vision adaptation and large language model reasoning tasks across challenging test scenarios and diverse models, including particularly collapse-prone tiny models.

Abstract

Test-time entropy minimization helps adapt a model to novel environments and incentivize its reasoning capability, unleashing the model's potential during inference by allowing it to evolve and improve in real-time using its own predictions, achieving promising performance. However, pure entropy minimization can favor non-generalizable shortcuts, such as inflating the logit norm and driving all predictions to a dominant class to reduce entropy, risking collapsed solutions (e.g., constant one-hot outputs) that trivially minimize the objective without meaningful learning. In this paper, we reveal asymmetry as a key mechanism for collapse prevention and introduce ZeroSiam--an efficient asymmetric Siamese architecture tailored for test-time entropy minimization. ZeroSiam prevents collapse through asymmetric divergence alignment, efficiently achieved by a learnable predictor and a stop-gradient operator before the classifier. We provide empirical and theoretical evidence that ZeroSiam not only prevents collapse, but also regularizes biased learning signals, enhancing performance even when no collapse occurs. Despite its simplicity, extensive results show that ZeroSiam performs more stably over prior methods using negligible overhead, demonstrating efficacy on both vision adaptation and large language model reasoning tasks across challenging test scenarios and diverse models, including particularly collapse-prone tiny models.

Paper Structure

This paper contains 38 sections, 3 theorems, 49 equations, 11 figures, 21 tables, 1 algorithm.

Key Result

Theorem 1

(Optimization and Stability of ZeroSiam) Consider the ZeroSiam objective $\mathcal{L} = H(p^o) + \alpha \, D\!\left(p^o \,\|\, \mathrm{sg}[p^r]\right)$, where $H(\cdot)$ denotes the entropy loss, $D(\cdot)$ the alignment regularizer, and $p^o, p^r \in \Delta^{|\mathcal{C}|-1}$ are the probability di (2) For $\alpha > 0$, the predictor $h$ serves as a filtering mechanism that suppresses gradient up

Figures (11)

  • Figure 1: Comparisons on architectures. (a) Alignment-oriented SSL methods (BYOL grill2020bootstrap, SimSiam chen2021exploring). (b) Test-time entropy minimization (Tent) wang2021tent. (c) Our ZeroSiam, which designs a minimal asymmetry for entropy minimization with a lightweight predictor and a stop-gradient branch—without augmentations, extra encoder passes, or teacher models—to substantially enhance learning stability and boost performance while retaining efficiency.
  • Figure 2: Empirical evidence of ZeroSiam's stabilization effects. (a) records the Frobenius distance between $\theta_h$ and the identity matrix under the non-i.i.d. streams with varying imbalance ratios niu2023sar. (b-d) record the ODD accuracy, logits $L_2$ norm, and center dominance in model predictions under a mild test scenario wang2021tent. Center dominance is calculated by $\|\overline{u}\|/\|u\|$ following zhang2022avoid. Experiments are conducted on ImageNet-C (Snow, level 5) with ResNet50-GN. For fair comparisons, ZeroSiam and Tent use the same learning rate configuration.
  • Figure 3: Resistance to learning from noise. Models pre-adapt on $N$ pure Gaussian noise, then run TTA on ImageNet-C (level 5).
  • Figure 4: Sensitivity to learning rates. Results are reported on Im-ageNet-C (level 5) with ViT-Base under label shifts w.r.t. Accuracy.
  • Figure : ZeroSiam: Test-Time Asymmetric Entropy Minimization.
  • ...and 6 more figures

Theorems & Definitions (7)

  • Theorem 1
  • Remark 1
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • proof