Table of Contents
Fetching ...

Genetic Information Analysis of Age-Related Macular Degeneration Fellow Eye Using Multi-Modal Selective ViT

Yoichi Furukawa, Satoshi Kamiya, Yoichi Sakurada, Kenji Kashiwagi, Kazuhiro Hotta

TL;DR

This work targets non-invasive genetic risk assessment for AMD by predicting the presence of risk alleles in $ARMS2$ and $CFH$ using a multi-modal framework that fuses fundus, OCT, and medical records. The proposed MSViT architecture (MME, ST, and Enhanced head) with TSIA for image-augmentation and a Record Revive Algorithm (RRA) for reconstructing tabular data achieves high accuracy (>80%) in predicting risk allele count $=2$. Key contributions include a multi-modal embedding strategy, selective attention to informative tokens, and reconstruction-driven training, which collectively improve classification performance and provide interpretable token visualizations. The approach addresses irregular modality counts and missing data, offering a practical pathway for integrating imaging and clinical data in AMD risk prediction.

Abstract

In recent years, there has been significant development in the analysis of medical data using machine learning. It is believed that the onset of Age-related Macular Degeneration (AMD) is associated with genetic polymorphisms. However, genetic analysis is costly, and artificial intelligence may offer assistance. This paper presents a method that predict the presence of multiple susceptibility genes for AMD using fundus and Optical Coherence Tomography (OCT) images, as well as medical records. Experimental results demonstrate that integrating information from multiple modalities can effectively predict the presence of susceptibility genes with over 80$\%$ accuracy.

Genetic Information Analysis of Age-Related Macular Degeneration Fellow Eye Using Multi-Modal Selective ViT

TL;DR

This work targets non-invasive genetic risk assessment for AMD by predicting the presence of risk alleles in and using a multi-modal framework that fuses fundus, OCT, and medical records. The proposed MSViT architecture (MME, ST, and Enhanced head) with TSIA for image-augmentation and a Record Revive Algorithm (RRA) for reconstructing tabular data achieves high accuracy (>80%) in predicting risk allele count . Key contributions include a multi-modal embedding strategy, selective attention to informative tokens, and reconstruction-driven training, which collectively improve classification performance and provide interpretable token visualizations. The approach addresses irregular modality counts and missing data, offering a practical pathway for integrating imaging and clinical data in AMD risk prediction.

Abstract

In recent years, there has been significant development in the analysis of medical data using machine learning. It is believed that the onset of Age-related Macular Degeneration (AMD) is associated with genetic polymorphisms. However, genetic analysis is costly, and artificial intelligence may offer assistance. This paper presents a method that predict the presence of multiple susceptibility genes for AMD using fundus and Optical Coherence Tomography (OCT) images, as well as medical records. Experimental results demonstrate that integrating information from multiple modalities can effectively predict the presence of susceptibility genes with over 80 accuracy.
Paper Structure (19 sections, 21 equations, 11 figures, 8 tables)

This paper contains 19 sections, 21 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: To process images and text simultaneously, MSViT includes a Multi-Modal Embedding (MME) for embedding information into tokens, a Selective Transformer (ST) with selective attention to tokens based on learned probabilities and dense feature extraction using a CNN, and an Enhanced head for classification.
  • Figure 2: Selective Attention : Each image generates N tokens through embedding, which pass through an MLP to produce selection probabilities $P_{N}$. Only the tokens with the top K$\%$ probabilities are used for attention, resulting in more efficient processing.
  • Figure 3: The overview of ST module
  • Figure 4: Table-based Similar Image Augmentation for OCT images
  • Figure 5: Areas like optic disc (green outline), less relevant to AMD, have lower selection frequencies, while drusen (red outline) and surrounding areas show higher frequencies.
  • ...and 6 more figures