OU-CoViT: Copula-Enhanced Bi-Channel Multi-Task Vision Transformers with Dual Adaptation for OU-UWF Images

Yang Li; Jianing Deng; Chong Zhong; Danjuan Yang; Meiyan Li; A. H. Welsh; Aiyi Liu; Xingtao Zhou; Catherine C. Liu; Bo Fu

OU-CoViT: Copula-Enhanced Bi-Channel Multi-Task Vision Transformers with Dual Adaptation for OU-UWF Images

Yang Li, Jianing Deng, Chong Zhong, Danjuan Yang, Meiyan Li, A. H. Welsh, Aiyi Liu, Xingtao Zhou, Catherine C. Liu, Bo Fu

TL;DR

The paper tackles myopia screening from ultra-widefield fundus images by proposing OU-CoViT, a Copula-enhanced bi-channel Vision Transformer that jointly models four mixed discrete–continuous clinical scores. It introduces a Gaussian copula–based Copula Loss with a closed-form joint density and a dual-adaptation bi-channel architecture to capture interocular asymmetries, enabling effective transfer learning via LoRA on small medical datasets. Empirical results show that combining Copula Loss with dual adaptation yields substantial gains in both regression (AL) and classification (HM) tasks over single-eye baselines and CNN/ViT variants, with strong generalization to other backbones. The approach offers a flexible, generalizable framework for multi-task learning with heterogeneous multi-channel inputs in ophthalmology and beyond.

Abstract

Myopia screening using cutting-edge ultra-widefield (UWF) fundus imaging and joint modeling of multiple discrete and continuous clinical scores presents a promising new paradigm for multi-task problems in Ophthalmology. The bi-channel framework that arises from the Ophthalmic phenomenon of ``interocular asymmetries'' of both eyes (OU) calls for new employment on the SOTA transformer-based models. However, the application of copula models for multiple mixed discrete-continuous labels on deep learning (DL) is challenging. Moreover, the application of advanced large transformer-based models to small medical datasets is challenging due to overfitting and computational resource constraints. To resolve these challenges, we propose OU-CoViT: a novel Copula-Enhanced Bi-Channel Multi-Task Vision Transformers with Dual Adaptation for OU-UWF images, which can i) incorporate conditional correlation information across multiple discrete and continuous labels within a deep learning framework (by deriving the closed form of a novel Copula Loss); ii) take OU inputs subject to both high correlation and interocular asymmetries using a bi-channel model with dual adaptation; and iii) enable the adaptation of large vision transformer (ViT) models to small medical datasets. Solid experiments demonstrate that OU-CoViT significantly improves prediction performance compared to single-channel baseline models with empirical loss. Furthermore, the novel architecture of OU-CoViT allows generalizability and extensions of our dual adaptation and Copula Loss to various ViT variants and large DL models on small medical datasets. Our approach opens up new possibilities for joint modeling of heterogeneous multi-channel input and mixed discrete-continuous clinical scores in medical practices and has the potential to advance AI-assisted clinical decision-making in various medical domains beyond Ophthalmology.

OU-CoViT: Copula-Enhanced Bi-Channel Multi-Task Vision Transformers with Dual Adaptation for OU-UWF Images

TL;DR

Abstract

Paper Structure (23 sections, 2 theorems, 10 equations, 3 figures, 2 tables)

This paper contains 23 sections, 2 theorems, 10 equations, 3 figures, 2 tables.

Introduction
Related work
Deep learning with UWF fundus imaging
Multi-task learning with correlation
Adaptation methods
Methods
Copula modeling
Gaussian copula for multi-label regression-classification tasks
4-dimensional Copula Loss
Estimation of copula parameters
Dual adaptation
Adapter modules for interocular asymmetries
LoRA for transfer learning
End-to-end OU-CoViT
Experiments
...and 8 more sections

Key Result

Theorem 1

Suppose that marginally $y_j \sim N(\mu_j,\sigma_j^2)$, for $j=1,2$, $y_{3} \sim \text{Bernoulli}(p_3)$, and $y_{4} \sim \text{Bernoulli}(p_4)$. Let be the correlation matrix in the Gaussian copula GaussianCopula. Then the closed form of the log joint density PeterSong: jointdensity is where $\bm{q} = (\frac{y_1-\mu_1}{\sigma_1},\frac{y_2-\mu_2}{\sigma_2})^T$, $C$ is a constant and the values o

Figures (3)

Figure 1: Architecture of proposed OU-CoViT.
Figure 2: (a) Original transformer block in classic single-channel ViT; (b) Detailed architecture of proposed dual adaptation in one transformer block with bi-channel modeling.
Figure 3: Prediction performances of the UWF dataset under different LoRA ranks ($r=4,8,\& 16$).

Theorems & Definitions (3)

Theorem 1
Definition 1
Proposition 1

OU-CoViT: Copula-Enhanced Bi-Channel Multi-Task Vision Transformers with Dual Adaptation for OU-UWF Images

TL;DR

Abstract

OU-CoViT: Copula-Enhanced Bi-Channel Multi-Task Vision Transformers with Dual Adaptation for OU-UWF Images

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (3)