JointViT: Modeling Oxygen Saturation Levels with Joint Supervision on Long-Tailed OCTA
Zeyu Zhang, Xuyin Qi, Mingxi Chen, Guangxi Li, Ryan Pham, Ayub Qassim, Ella Berry, Zhibin Liao, Owen Siggs, Robert Mclaughlin, Jamie Craig, Minh-Son To
TL;DR
This work tackles predicting oxygen saturation statuses from OCTA images under severe long-tail class distribution, addressing practical sleep-disorder screening needs. It introduces JointViT, a Vision Transformer with a joint supervision scheme on $Y_{ ext{cls}}$ and $Y_{ ext{val}}$ (SaO$_2$ categories and exact values) and a balancing augmentation pipeline to mitigate data imbalance. The method couples a pretrained ViT backbone with post-training on Kermany v3 and processes 3D OCTA volumes as 2D slices, optimized by a joint loss $L = \lambda L_{ ext{bce}}(T(X), Y_{ ext{cls}}) + (1 - \lambda)L_{ ext{mse}}(T(X), Y_{ ext{val}})$, achieving up to 12.28% improvements in accuracy over state-of-the-art baselines. Results on Prog-OCTA and Kermany v3 demonstrate enhanced sensitivity for minority classes and robust generalization, implying OCTA’s practical potential for noninvasive sleep-disorder diagnostics.
Abstract
The oxygen saturation level in the blood (SaO2) is crucial for health, particularly in relation to sleep-related breathing disorders. However, continuous monitoring of SaO2 is time-consuming and highly variable depending on patients' conditions. Recently, optical coherence tomography angiography (OCTA) has shown promising development in rapidly and effectively screening eye-related lesions, offering the potential for diagnosing sleep-related disorders. To bridge this gap, our paper presents three key contributions. Firstly, we propose JointViT, a novel model based on the Vision Transformer architecture, incorporating a joint loss function for supervision. Secondly, we introduce a balancing augmentation technique during data preprocessing to improve the model's performance, particularly on the long-tail distribution within the OCTA dataset. Lastly, through comprehensive experiments on the OCTA dataset, our proposed method significantly outperforms other state-of-the-art methods, achieving improvements of up to 12.28% in overall accuracy. This advancement lays the groundwork for the future utilization of OCTA in diagnosing sleep-related disorders. See project website https://steve-zeyu-zhang.github.io/JointViT
