OncoVision: Integrating Mammography and Clinical Data through Attention-Driven Multimodal AI for Enhanced Breast Cancer Diagnosis

Istiak Ahmed; Galib Ahmed; K. Shahriar Sanjid; Md. Tanzim Hossain; Md. Nishan Khan; Md. Misbah Khan; Md. Arifur Rahman; Sheikh Anisul Haque; Sharmin Akhtar Rupa; Mohammed Mejbahuddin Mia; Mahmud Hasan Mostofa Kamal; Md. Mostafa Kamal Sarker; M. Monir Uddin

OncoVision: Integrating Mammography and Clinical Data through Attention-Driven Multimodal AI for Enhanced Breast Cancer Diagnosis

Istiak Ahmed, Galib Ahmed, K. Shahriar Sanjid, Md. Tanzim Hossain, Md. Nishan Khan, Md. Misbah Khan, Md. Arifur Rahman, Sheikh Anisul Haque, Sharmin Akhtar Rupa, Mohammed Mejbahuddin Mia, Mahmud Hasan Mostofa Kamal, Md. Mostafa Kamal Sarker, M. Monir Uddin

TL;DR

OncoVision addresses variability in breast cancer screening by integrating mammography with structured clinical data in a single end-to-end multimodal AI pipeline. It jointly performs pixel-level segmentation of four ROIs and ten clinical-feature predictions using an attention-based encoder-decoder backbone and two late-fusion strategies, achieving state-of-the-art segmentation and high AUCs for clinical labels. A reader study and comprehensive analyses demonstrate improved radiologist confidence, reduced diagnostic time, and robust interpretability via Grad-CAM, supporting real-world deployment and potential impact in resource-limited settings. The work highlights the promise of multimodal fusion to standardize reporting, democratize access to care, and enable rapid, accurate breast cancer risk stratification and decision support.

Abstract

OncoVision is a multimodal AI pipeline that combines mammography images and clinical data for better breast cancer diagnosis. Employing an attention-based encoder-decoder backbone, it jointly segments four ROIs - masses, calcifications, axillary findings, and breast tissues - with state-of-the-art accuracy and robustly predicts ten structured clinical features: mass morphology, calcification type, ACR breast density, and BI-RADS categories. To fuse imaging and clinical insights, we developed two late-fusion strategies. By utilizing complementary multimodal data, late fusion strategies improve diagnostic precision and reduce inter-observer variability. Operationalized as a secure, user-friendly web application, OncoVision produces structured reports with dual-confidence scoring and attention-weighted visualizations for real-time diagnostic support to improve clinician trust and facilitate medical teaching. It can be easily incorporated into the clinic, making screening available in underprivileged areas around the world, such as rural South Asia. Combining accurate segmentation with clinical intuition, OncoVision raises the bar for AI-based mammography, offering a scalable and equitable solution to detect breast cancer at an earlier stage and enhancing treatment through timely interventions.

OncoVision: Integrating Mammography and Clinical Data through Attention-Driven Multimodal AI for Enhanced Breast Cancer Diagnosis

TL;DR

Abstract

OncoVision: Integrating Mammography and Clinical Data through Attention-Driven Multimodal AI for Enhanced Breast Cancer Diagnosis

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (14)