Genetic Information Analysis of Age-Related Macular Degeneration Fellow Eye Using Multi-Modal Selective ViT

Yoichi Furukawa; Satoshi Kamiya; Yoichi Sakurada; Kenji Kashiwagi; Kazuhiro Hotta

Genetic Information Analysis of Age-Related Macular Degeneration Fellow Eye Using Multi-Modal Selective ViT

Yoichi Furukawa, Satoshi Kamiya, Yoichi Sakurada, Kenji Kashiwagi, Kazuhiro Hotta

TL;DR

This work targets non-invasive genetic risk assessment for AMD by predicting the presence of risk alleles in $ARMS2$ and $CFH$ using a multi-modal framework that fuses fundus, OCT, and medical records. The proposed MSViT architecture (MME, ST, and Enhanced head) with TSIA for image-augmentation and a Record Revive Algorithm (RRA) for reconstructing tabular data achieves high accuracy (>80%) in predicting risk allele count $=2$. Key contributions include a multi-modal embedding strategy, selective attention to informative tokens, and reconstruction-driven training, which collectively improve classification performance and provide interpretable token visualizations. The approach addresses irregular modality counts and missing data, offering a practical pathway for integrating imaging and clinical data in AMD risk prediction.

Abstract

In recent years, there has been significant development in the analysis of medical data using machine learning. It is believed that the onset of Age-related Macular Degeneration (AMD) is associated with genetic polymorphisms. However, genetic analysis is costly, and artificial intelligence may offer assistance. This paper presents a method that predict the presence of multiple susceptibility genes for AMD using fundus and Optical Coherence Tomography (OCT) images, as well as medical records. Experimental results demonstrate that integrating information from multiple modalities can effectively predict the presence of susceptibility genes with over 80$\%$ accuracy.

Genetic Information Analysis of Age-Related Macular Degeneration Fellow Eye Using Multi-Modal Selective ViT

TL;DR

This work targets non-invasive genetic risk assessment for AMD by predicting the presence of risk alleles in

and

using a multi-modal framework that fuses fundus, OCT, and medical records. The proposed MSViT architecture (MME, ST, and Enhanced head) with TSIA for image-augmentation and a Record Revive Algorithm (RRA) for reconstructing tabular data achieves high accuracy (>80%) in predicting risk allele count

. Key contributions include a multi-modal embedding strategy, selective attention to informative tokens, and reconstruction-driven training, which collectively improve classification performance and provide interpretable token visualizations. The approach addresses irregular modality counts and missing data, offering a practical pathway for integrating imaging and clinical data in AMD risk prediction.

Abstract

accuracy.

Paper Structure (19 sections, 21 equations, 11 figures, 8 tables)

This paper contains 19 sections, 21 equations, 11 figures, 8 tables.

Introduction
Related Works
Proposed method
Multi-modal Selective ViT (MSViT)
Multi-Modal Embedding (MME)
Selective Transformer (ST)
Enhanced head
Visualization of Selected Tokens
Table-based Similar Image Augmentation
Experiments
Dataset
Training and Evaluation Methods
Results
Visualization of Selected Tokens
Ablation Study
...and 4 more sections

Figures (11)

Figure 1: To process images and text simultaneously, MSViT includes a Multi-Modal Embedding (MME) for embedding information into tokens, a Selective Transformer (ST) with selective attention to tokens based on learned probabilities and dense feature extraction using a CNN, and an Enhanced head for classification.
Figure 2: Selective Attention : Each image generates N tokens through embedding, which pass through an MLP to produce selection probabilities $P_{N}$. Only the tokens with the top K$\%$ probabilities are used for attention, resulting in more efficient processing.
Figure 3: The overview of ST module
Figure 4: Table-based Similar Image Augmentation for OCT images
Figure 5: Areas like optic disc (green outline), less relevant to AMD, have lower selection frequencies, while drusen (red outline) and surrounding areas show higher frequencies.
...and 6 more figures

Genetic Information Analysis of Age-Related Macular Degeneration Fellow Eye Using Multi-Modal Selective ViT

TL;DR

Abstract

Genetic Information Analysis of Age-Related Macular Degeneration Fellow Eye Using Multi-Modal Selective ViT

Authors

TL;DR

Abstract

Table of Contents

Figures (11)