Table of Contents
Fetching ...

TAPE: A two-stage parameter-efficient adaptation framework for foundation models in OCT-OCTA analysis

Xiaofei Su, Zengshuo Wang, Minghe Sun, Xin Zhao, Mingzhu Sun

Abstract

Automated analysis of optical coherence tomography (OCT) and OCT angiography (OCTA) images is critical for robust ophthalmic diagnosis. Existing mainstream methods trained from scratch rely heavily on massive data and model scale, thereby hindering their practical deployment in resource-constrained clinical settings. Although transfer learning based on foundation models (FMs) is promising, it still faces significant challenges: domain shift and task misalignment. To address these, we propose TAPE: A Two-stage Adaptation Framework via Parameter-Efficient Fine-tuning, which strategically decouples adaptation into domain alignment and task fitting for downstream segmentation. The domain adaptation stage notably applies parameter-efficient fine-tuning (PEFT) in the context of masked image modeling for medical image domain adaptation, a novel approach to the best of our knowledge. Applying TAPE to retinal layer segmentation on both universal (masked auto-encoder, MAE) and specialized (RETFound) FMs, it demonstrates superior parameter efficiency and achieves state-of-the-art generalization performance across diverse pathologies.

TAPE: A two-stage parameter-efficient adaptation framework for foundation models in OCT-OCTA analysis

Abstract

Automated analysis of optical coherence tomography (OCT) and OCT angiography (OCTA) images is critical for robust ophthalmic diagnosis. Existing mainstream methods trained from scratch rely heavily on massive data and model scale, thereby hindering their practical deployment in resource-constrained clinical settings. Although transfer learning based on foundation models (FMs) is promising, it still faces significant challenges: domain shift and task misalignment. To address these, we propose TAPE: A Two-stage Adaptation Framework via Parameter-Efficient Fine-tuning, which strategically decouples adaptation into domain alignment and task fitting for downstream segmentation. The domain adaptation stage notably applies parameter-efficient fine-tuning (PEFT) in the context of masked image modeling for medical image domain adaptation, a novel approach to the best of our knowledge. Applying TAPE to retinal layer segmentation on both universal (masked auto-encoder, MAE) and specialized (RETFound) FMs, it demonstrates superior parameter efficiency and achieves state-of-the-art generalization performance across diverse pathologies.

Paper Structure

This paper contains 11 sections, 3 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Proposed TAPE architecture which contains domain adaptation (Stage I) and task adaptation (Stage II). Domain adapter is trained on MIM-based SSL task to add knowledge of target data domain for FM. Task adapter is trained on downstream task, to fit FM for layer segmentation task.
  • Figure 2: TAPE segments retinal layers while achieving superior performance. Rows correspond to the NORMAL class and three disease classes (DR, RVO, and AMD), and columns show labels and segmentation maps generated by TAPE, STL-OCT, STL, FFT-DA, FFT-TA, DLoRA, and TLoRA. Segmentation targets include the internal limiting membrane (ILM), inner plexiform layer (IPL), outer plexiform layer (OPL), inner segment/outer segment (ISOS), retinal pigment epithelium (RPE), and Bruch’s membrane (BM).