Towards Clinician-Preferred Segmentation: Leveraging Human-in-the-Loop for Test Time Adaptation in Medical Image Segmentation

Shishuai Hu; Zehui Liao; Zeyou Liu; Yong Xia

Towards Clinician-Preferred Segmentation: Leveraging Human-in-the-Loop for Test Time Adaptation in Medical Image Segmentation

Shishuai Hu, Zehui Liao, Zeyou Liu, Yong Xia

TL;DR

This work tackles cross-center distribution shifts in medical image segmentation by introducing HiTTA, a Human-in-the-loop Test Time Adaptation framework that combines a BN-parameter divergence loss with clinician-corrected feedback. It operates in three stages: pre-inference style augmentation to adapt BN parameters via the divergence loss $\mathcal{L}_{div}$, inference with clinician correction of predictions $\hat{y}_i^t$ to $y_i^t$, and post-inference with a preference head $\mathcal{H}_{\theta_i^h}$ trained using $\mathcal{L}_{seg}$ and weighted by $1+\mathcal{M}_{div}$ to reflect human feedback. Evaluated on the cross-domain, multi-annotator OD/OC segmentation dataset RIGA+ with Dice Similarity Coefficient as the metric, HiTTA outperforms eight baselines, and ablation studies confirm the critical roles of both the divergence loss and the human-in-the-loop late-stage optimization. The results demonstrate that incorporating clinician feedback into TTA improves clinical alignment and generalization across medical centers, offering a path toward more practical, human-aware AI-assisted diagnostic tools in ophthalmic imaging and beyond.

Abstract

Deep learning-based medical image segmentation models often face performance degradation when deployed across various medical centers, largely due to the discrepancies in data distribution. Test Time Adaptation (TTA) methods, which adapt pre-trained models to test data, have been employed to mitigate such discrepancies. However, existing TTA methods primarily focus on manipulating Batch Normalization (BN) layers or employing prompt and adversarial learning, which may not effectively rectify the inconsistencies arising from divergent data distributions. In this paper, we propose a novel Human-in-the-loop TTA (HiTTA) framework that stands out in two significant ways. First, it capitalizes on the largely overlooked potential of clinician-corrected predictions, integrating these corrections into the TTA process to steer the model towards predictions that coincide more closely with clinical annotation preferences. Second, our framework conceives a divergence loss, designed specifically to diminish the prediction divergence instigated by domain disparities, through the careful calibration of BN parameters. Our HiTTA is distinguished by its dual-faceted capability to acclimatize to the distribution of test data whilst ensuring the model's predictions align with clinical expectations, thereby enhancing its relevance in a medical context. Extensive experiments on a public dataset underscore the superiority of our HiTTA over existing TTA methods, emphasizing the advantages of integrating human feedback and our divergence loss in enhancing the model's performance and adaptability across diverse medical centers.

Towards Clinician-Preferred Segmentation: Leveraging Human-in-the-Loop for Test Time Adaptation in Medical Image Segmentation

TL;DR

, inference with clinician correction of predictions

, and post-inference with a preference head

trained using

and weighted by

to reflect human feedback. Evaluated on the cross-domain, multi-annotator OD/OC segmentation dataset RIGA+ with Dice Similarity Coefficient as the metric, HiTTA outperforms eight baselines, and ablation studies confirm the critical roles of both the divergence loss and the human-in-the-loop late-stage optimization. The results demonstrate that incorporating clinician feedback into TTA improves clinical alignment and generalization across medical centers, offering a path toward more practical, human-aware AI-assisted diagnostic tools in ophthalmic imaging and beyond.

Abstract

Paper Structure (12 sections, 4 equations, 3 figures, 2 tables)

This paper contains 12 sections, 4 equations, 3 figures, 2 tables.

Introduction
Method
Problem Definition and Method Overview
Pre-inference Stage
Post-inference Stage
Experiments and Results
Materials and Evaluation Metric
Implementation Details
Comparative Experiments
Ablation Analysis
Conclusion
Acknowledgement:

Figures (3)

Figure 1: Comparison of (a) No TTA, (b) Previous TTA Methods, and (c) Proposed HiTTA. Gray arrows show data flow, blue arrows indicate model updates using predictions, and red arrow highlights optimization with clinician corrections. Unlike previous methods, HiTTA incorporates human-in-the-loop feedback for enhanced performance.
Figure 2: Illustration of proposed HiTTA framework. It is mainly composed of (a) Pre-inference Stage, inference stage, and (b) Post-inference Stage. (c) shows the workflow of the proposed HiTTA framework.
Figure 3: Visualization of segmentation masks predicted by HiTTA and seven existing methods, together with ground truth (GT-R1 and GT-R*).

Towards Clinician-Preferred Segmentation: Leveraging Human-in-the-Loop for Test Time Adaptation in Medical Image Segmentation

TL;DR

Abstract

Towards Clinician-Preferred Segmentation: Leveraging Human-in-the-Loop for Test Time Adaptation in Medical Image Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (3)