Gradient Alignment Improves Test-Time Adaptation for Medical Image Segmentation

Ziyang Chen; Yiwen Ye; Yongsheng Pan; Yong Xia

Gradient Alignment Improves Test-Time Adaptation for Medical Image Segmentation

Ziyang Chen, Yiwen Ye, Yongsheng Pan, Yong Xia

TL;DR

The paper addresses domain shift in medical image segmentation by rethinking test-time adaptation. It introduces GraTa, a gradient alignment framework that uses an auxiliary gradient from the entropy loss in conjunction with a pseudo gradient from a consistency objective, updating the model via $ heta^\'= heta- abla\text{L}_{\text{ent}}$ and optimizing $\text{L}_{\text{con}}$ on the shifted parameter. A dynamic learning rate is derived from the cosine similarity between the two gradients through $\text{η}=β \cdot \text{Cus}(\frac{\nabla\text{L}_{\text{con}}(\theta^\';\boldsymbol{X}^t_i)\cdot\nabla\text{L}_{\text{ent}}(\theta;\boldsymbol{X}^t_i)}{\|\nabla\text{L}_{\text{con}}(\theta^\';\boldsymbol{X}^t_i)\|\|\nabla\text{L}_{\text{ent}}(\theta;\boldsymbol{X}^t_i)\|})$, with Cus(x)=(1/4)(x+1)^2. Extensive experiments on cross-domain OD/OC segmentation show GraTa outperforms state-of-the-art TTA methods, confirming the value of gradient alignment and dynamic LR for robust, test-time segmentation in clinical settings. The approach achieves notable improvements in Dice Score across target domains and demonstrates the practical impact of aligning gradient directions and adaptively scaling updates during inference.

Abstract

Although recent years have witnessed significant advancements in medical image segmentation, the pervasive issue of domain shift among medical images from diverse centres hinders the effective deployment of pre-trained models. Many Test-time Adaptation (TTA) methods have been proposed to address this issue by fine-tuning pre-trained models with test data during inference. These methods, however, often suffer from less-satisfactory optimization due to suboptimal optimization direction (dictated by the gradient) and fixed step-size (predicated on the learning rate). In this paper, we propose the Gradient alignment-based Test-time adaptation (GraTa) method to improve both the gradient direction and learning rate in the optimization procedure. Unlike conventional TTA methods, which primarily optimize the pseudo gradient derived from a self-supervised objective, our method incorporates an auxiliary gradient with the pseudo one to facilitate gradient alignment. Such gradient alignment enables the model to excavate the similarities between different gradients and correct the gradient direction to approximate the empirical gradient related to the current segmentation task. Additionally, we design a dynamic learning rate based on the cosine similarity between the pseudo and auxiliary gradients, thereby empowering the adaptive fine-tuning of pre-trained models on diverse test data. Extensive experiments establish the effectiveness of the proposed gradient alignment and dynamic learning rate and substantiate the superiority of our GraTa method over other state-of-the-art TTA methods on a benchmark medical image segmentation task. The code and weights of pre-trained source models are available at https://github.com/Chen-Ziyang/GraTa.

Gradient Alignment Improves Test-Time Adaptation for Medical Image Segmentation

TL;DR

and optimizing

on the shifted parameter. A dynamic learning rate is derived from the cosine similarity between the two gradients through

, with Cus(x)=(1/4)(x+1)^2. Extensive experiments on cross-domain OD/OC segmentation show GraTa outperforms state-of-the-art TTA methods, confirming the value of gradient alignment and dynamic LR for robust, test-time segmentation in clinical settings. The approach achieves notable improvements in Dice Score across target domains and demonstrates the practical impact of aligning gradient directions and adaptively scaling updates during inference.

Abstract

Paper Structure (16 sections, 7 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 16 sections, 7 equations, 4 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Test-time Adaptation
Gradient Alignment
Dynamic Learning Rate
Method
Problem Definition
The Objective of GraTa
Dynamic Learning Rate
Experiments and Results
Datasets and Evaluation Metrics
Implementation Details
Results
Discussions
Conclusion
...and 1 more sections

Figures (4)

Figure 1: Illustration of our motivation. (a) We display a pseudo gradient $\nabla\mathcal{L}_{pse}(\theta)$, which is primarily utilized for optimization but may diverge far from the empirical gradient $\nabla\mathcal{L}_{emp}(\theta)$, tailored to the specific task (segmentation in this study). Existing methods typically optimize $\nabla\mathcal{L}_{pse}(\theta)$ in a straightforward manner. (b) Our GraTa introduces an auxiliary gradient $\nabla\mathcal{L}_{aux}(\theta)$ to minimize the angle between $\nabla\mathcal{L}_{pse}(\theta)$ and $\nabla\mathcal{L}_{aux}(\theta)$, resulting in a novel auxiliary objective, i.e., gradient alignment. (c) Although achieving complete alignment is challenging due to the different objectives of these two gradients, the model can learn to align their task-relevant components $\nabla_\checkmark$ through this auxiliary objective, approximating $\nabla\mathcal{L}_{emp}(\theta)$ and facilitating effective fine-tuning. $\angle$ denotes the angle. 'Obj': Abbreviation of 'Objective'.
Figure 2: Overview of our GraTa. For each test image, we first calculate the entropy loss on its prediction to update the pre-trained model $f_{\theta}$ by $\theta^{'}=\theta-\nabla\mathcal{L}_{ent}$. Then we perform weak and strong augmentation on the original test image to obtain a strong augmentation variant and a set of weak augmentation variants and calculate the consistency loss on their predictions produced by $f_{\theta^{'}}$. Finally, the consistency loss is utilized to fine-tune the model by $\theta^{*}\leftarrow \theta-\eta\nabla\mathcal{L}_{con}$, and the test image is fed into $f_{\theta^{*}}$ for inference.
Figure 3: Qualitative results of our GraTa and eight TTA methods using Domain D as the source domain and remaining domains as target domains. We displayed the results of two simple samples (rows 1-2) and two hard samples (rows 3-4). The ground-truth bounding boxes of OD and OC were overlaid on each segmentation result to highlight potential over- or under-segmentation. Best viewed in color.
Figure 4: Cosine similarity between the pseudo gradient and empirical gradient w/ or w/o our proposed gradient alignment. The objectives of "TENT" and "TENT w/ Align" are $\mathcal{L}_{ent} (\theta;\mathcal{X}^t_i)$ and $\mathcal{L}_{ent} (\theta-\nabla\mathcal{L}_{con} (\theta;\mathcal{X}^t_i);\mathcal{X}^t_i)$, respectively.

Gradient Alignment Improves Test-Time Adaptation for Medical Image Segmentation

TL;DR

Abstract

Gradient Alignment Improves Test-Time Adaptation for Medical Image Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)