JointViT: Modeling Oxygen Saturation Levels with Joint Supervision on Long-Tailed OCTA

Zeyu Zhang; Xuyin Qi; Mingxi Chen; Guangxi Li; Ryan Pham; Ayub Qassim; Ella Berry; Zhibin Liao; Owen Siggs; Robert Mclaughlin; Jamie Craig; Minh-Son To

JointViT: Modeling Oxygen Saturation Levels with Joint Supervision on Long-Tailed OCTA

Zeyu Zhang, Xuyin Qi, Mingxi Chen, Guangxi Li, Ryan Pham, Ayub Qassim, Ella Berry, Zhibin Liao, Owen Siggs, Robert Mclaughlin, Jamie Craig, Minh-Son To

TL;DR

This work tackles predicting oxygen saturation statuses from OCTA images under severe long-tail class distribution, addressing practical sleep-disorder screening needs. It introduces JointViT, a Vision Transformer with a joint supervision scheme on $Y_{ ext{cls}}$ and $Y_{ ext{val}}$ (SaO$_2$ categories and exact values) and a balancing augmentation pipeline to mitigate data imbalance. The method couples a pretrained ViT backbone with post-training on Kermany v3 and processes 3D OCTA volumes as 2D slices, optimized by a joint loss $L = \lambda L_{ ext{bce}}(T(X), Y_{ ext{cls}}) + (1 - \lambda)L_{ ext{mse}}(T(X), Y_{ ext{val}})$, achieving up to 12.28% improvements in accuracy over state-of-the-art baselines. Results on Prog-OCTA and Kermany v3 demonstrate enhanced sensitivity for minority classes and robust generalization, implying OCTA’s practical potential for noninvasive sleep-disorder diagnostics.

Abstract

The oxygen saturation level in the blood (SaO2) is crucial for health, particularly in relation to sleep-related breathing disorders. However, continuous monitoring of SaO2 is time-consuming and highly variable depending on patients' conditions. Recently, optical coherence tomography angiography (OCTA) has shown promising development in rapidly and effectively screening eye-related lesions, offering the potential for diagnosing sleep-related disorders. To bridge this gap, our paper presents three key contributions. Firstly, we propose JointViT, a novel model based on the Vision Transformer architecture, incorporating a joint loss function for supervision. Secondly, we introduce a balancing augmentation technique during data preprocessing to improve the model's performance, particularly on the long-tail distribution within the OCTA dataset. Lastly, through comprehensive experiments on the OCTA dataset, our proposed method significantly outperforms other state-of-the-art methods, achieving improvements of up to 12.28% in overall accuracy. This advancement lays the groundwork for the future utilization of OCTA in diagnosing sleep-related disorders. See project website https://steve-zeyu-zhang.github.io/JointViT

JointViT: Modeling Oxygen Saturation Levels with Joint Supervision on Long-Tailed OCTA

TL;DR

and

(SaO

categories and exact values) and a balancing augmentation pipeline to mitigate data imbalance. The method couples a pretrained ViT backbone with post-training on Kermany v3 and processes 3D OCTA volumes as 2D slices, optimized by a joint loss

, achieving up to 12.28% improvements in accuracy over state-of-the-art baselines. Results on Prog-OCTA and Kermany v3 demonstrate enhanced sensitivity for minority classes and robust generalization, implying OCTA’s practical potential for noninvasive sleep-disorder diagnostics.

Abstract

Paper Structure (17 sections, 1 equation, 7 figures, 8 tables)

This paper contains 17 sections, 1 equation, 7 figures, 8 tables.

Introduction
Related Works
Medical Imaging Recognition
Long-Tailed Image Recognition
OCTA in AI for Health
Oxygen Saturation Prediction
Datasets
Prog-OCTA
Kermany v3
Methodology
Balancing Augmentation
JointViT Backbone
Experiments
Evaluation Matrices
Comparative Studies
...and 2 more sections

Figures (7)

Figure 1: The figure illustrates the pipeline of our proposed JointViT, which comprises a balancing augmentation and a plain Vision Transformer with a joint loss for supervision. The classes are denoted as numbers in Kermany v3 dataset kermany2018identifying and alphabets in Prog-OCTA. GT abbreviates ground truth.
Figure 2: The figure illustrates OCTA instances corresponding to each level of $\text{SaO}_\text{2}$ in Prog-OCTA dataset.
Figure 3: The figure illustrates the $\text{SaO}_\text{2}$ value of patients in Prog-OCTA dataset, and the distribution of the dataset is imbalanced and apparently has a lower average $\text{SaO}_\text{2}$ than the normal people.
Figure 4: The figure shows the long-tailed and imbalanced distribution of $\text{SaO}_\text{2}$ classes in Prog-OCTA, with the borderline low class being predominant.
Figure 5: The figures depict four categories of OCT included in Kermany v3 kermany2018identifying: Diabetic macular edema (DME), characterized by fluid accumulation in the macula due to diabetes; CNV (Choroidal Neovascularization), involving abnormal growth of blood vessels in the retina; Drusen, indicated by the accumulation of deposits comprised of lipids and proteins in the retina; and normal instances.
...and 2 more figures

JointViT: Modeling Oxygen Saturation Levels with Joint Supervision on Long-Tailed OCTA

TL;DR

Abstract

JointViT: Modeling Oxygen Saturation Levels with Joint Supervision on Long-Tailed OCTA

Authors

TL;DR

Abstract

Table of Contents

Figures (7)