Table of Contents
Fetching ...

FunOTTA: On-the-Fly Adaptation on Cross-Domain Fundus Image via Stable Test-time Training

Qian Zeng, Le Zhang, Yipeng Liu, Ce Zhu, Fan Zhang

TL;DR

The paper addresses domain shifts in fundus image diagnosis across devices and sites by introducing FunOTTA, a training-based test-time adaptation framework that operates online with a frozen feature extractor, a dynamic memory bank, and a prototypical ensemble. It combines a dynamic filtering mechanism, class-conditional estimation, a confidence-guided contrastive loss, and dual-level alignment to achieve stable, bias-reducing adaptation with minimal prior knowledge leakage. Empirical results on diabetic retinopathy and glaucoma benchmarks show FunOTTA outperforms a broad range of SOTA TTA methods, with strong robustness to hyperparameters and label shifts. This work enables real-time, privacy-conscious deployment of cross-domain fundus diagnosis models and points to future extensions for open-set medical imaging scenarios.

Abstract

Fundus images are essential for the early screening and detection of eye diseases. While deep learning models using fundus images have significantly advanced the diagnosis of multiple eye diseases, variations in images from different imaging devices and locations (known as domain shifts) pose challenges for deploying pre-trained models in real-world applications. To address this, we propose a novel Fundus On-the-fly Test-Time Adaptation (FunOTTA) framework that effectively generalizes a fundus image diagnosis model to unseen environments, even under strong domain shifts. FunOTTA stands out for its stable adaptation process by performing dynamic disambiguation in the memory bank while minimizing harmful prior knowledge bias. We also introduce a new training objective during adaptation that enables the classifier to incrementally adapt to target patterns with reliable class conditional estimation and consistency regularization. We compare our method with several state-of-the-art test-time adaptation (TTA) pipelines. Experiments on cross-domain fundus image benchmarks across two diseases demonstrate the superiority of the overall framework and individual components under different backbone networks. Code is available at https://github.com/Casperqian/FunOTTA.

FunOTTA: On-the-Fly Adaptation on Cross-Domain Fundus Image via Stable Test-time Training

TL;DR

The paper addresses domain shifts in fundus image diagnosis across devices and sites by introducing FunOTTA, a training-based test-time adaptation framework that operates online with a frozen feature extractor, a dynamic memory bank, and a prototypical ensemble. It combines a dynamic filtering mechanism, class-conditional estimation, a confidence-guided contrastive loss, and dual-level alignment to achieve stable, bias-reducing adaptation with minimal prior knowledge leakage. Empirical results on diabetic retinopathy and glaucoma benchmarks show FunOTTA outperforms a broad range of SOTA TTA methods, with strong robustness to hyperparameters and label shifts. This work enables real-time, privacy-conscious deployment of cross-domain fundus diagnosis models and points to future extensions for open-set medical imaging scenarios.

Abstract

Fundus images are essential for the early screening and detection of eye diseases. While deep learning models using fundus images have significantly advanced the diagnosis of multiple eye diseases, variations in images from different imaging devices and locations (known as domain shifts) pose challenges for deploying pre-trained models in real-world applications. To address this, we propose a novel Fundus On-the-fly Test-Time Adaptation (FunOTTA) framework that effectively generalizes a fundus image diagnosis model to unseen environments, even under strong domain shifts. FunOTTA stands out for its stable adaptation process by performing dynamic disambiguation in the memory bank while minimizing harmful prior knowledge bias. We also introduce a new training objective during adaptation that enables the classifier to incrementally adapt to target patterns with reliable class conditional estimation and consistency regularization. We compare our method with several state-of-the-art test-time adaptation (TTA) pipelines. Experiments on cross-domain fundus image benchmarks across two diseases demonstrate the superiority of the overall framework and individual components under different backbone networks. Code is available at https://github.com/Casperqian/FunOTTA.
Paper Structure (21 sections, 13 equations, 8 figures, 7 tables)

This paper contains 21 sections, 13 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Method overview of FunOTTA to address domain shifts in streaming fundus images. Input images $x$ are fed into the source-trained feature extractor $f_\theta$ to obtain latent features, followed by a memory bank with dynamic filtering mechanism for feature-level disambiguation. Latent features $f_\theta\left(x\right)$ are then processed by an ensemble learner $h_\phi$ to generate low-dimensional embeddings, which are then utilized by a non-parametric classifier to aggregate neighbor information for final predictions. Consistency alignment and contrastive learning paradigm are employed to ensure stable adaptation.
  • Figure 2: Data sample and RGB statistics of fundus images across different sites.
  • Figure 3: Distribution of performance improvements by the FunOTTA for models trained with different hyperparameters.
  • Figure 4: Effect of the number of ensemble learners across DR and glaucoma datasets. Here we used ResNet50 as the backbone network.
  • Figure 5: Comparison between different filtering mechanisms across two diseases. The results are averaged across five runs, using ResNet50 as the backbone network.
  • ...and 3 more figures