Table of Contents
Fetching ...

OncoVision: Integrating Mammography and Clinical Data through Attention-Driven Multimodal AI for Enhanced Breast Cancer Diagnosis

Istiak Ahmed, Galib Ahmed, K. Shahriar Sanjid, Md. Tanzim Hossain, Md. Nishan Khan, Md. Misbah Khan, Md. Arifur Rahman, Sheikh Anisul Haque, Sharmin Akhtar Rupa, Mohammed Mejbahuddin Mia, Mahmud Hasan Mostofa Kamal, Md. Mostafa Kamal Sarker, M. Monir Uddin

TL;DR

OncoVision addresses variability in breast cancer screening by integrating mammography with structured clinical data in a single end-to-end multimodal AI pipeline. It jointly performs pixel-level segmentation of four ROIs and ten clinical-feature predictions using an attention-based encoder-decoder backbone and two late-fusion strategies, achieving state-of-the-art segmentation and high AUCs for clinical labels. A reader study and comprehensive analyses demonstrate improved radiologist confidence, reduced diagnostic time, and robust interpretability via Grad-CAM, supporting real-world deployment and potential impact in resource-limited settings. The work highlights the promise of multimodal fusion to standardize reporting, democratize access to care, and enable rapid, accurate breast cancer risk stratification and decision support.

Abstract

OncoVision is a multimodal AI pipeline that combines mammography images and clinical data for better breast cancer diagnosis. Employing an attention-based encoder-decoder backbone, it jointly segments four ROIs - masses, calcifications, axillary findings, and breast tissues - with state-of-the-art accuracy and robustly predicts ten structured clinical features: mass morphology, calcification type, ACR breast density, and BI-RADS categories. To fuse imaging and clinical insights, we developed two late-fusion strategies. By utilizing complementary multimodal data, late fusion strategies improve diagnostic precision and reduce inter-observer variability. Operationalized as a secure, user-friendly web application, OncoVision produces structured reports with dual-confidence scoring and attention-weighted visualizations for real-time diagnostic support to improve clinician trust and facilitate medical teaching. It can be easily incorporated into the clinic, making screening available in underprivileged areas around the world, such as rural South Asia. Combining accurate segmentation with clinical intuition, OncoVision raises the bar for AI-based mammography, offering a scalable and equitable solution to detect breast cancer at an earlier stage and enhancing treatment through timely interventions.

OncoVision: Integrating Mammography and Clinical Data through Attention-Driven Multimodal AI for Enhanced Breast Cancer Diagnosis

TL;DR

OncoVision addresses variability in breast cancer screening by integrating mammography with structured clinical data in a single end-to-end multimodal AI pipeline. It jointly performs pixel-level segmentation of four ROIs and ten clinical-feature predictions using an attention-based encoder-decoder backbone and two late-fusion strategies, achieving state-of-the-art segmentation and high AUCs for clinical labels. A reader study and comprehensive analyses demonstrate improved radiologist confidence, reduced diagnostic time, and robust interpretability via Grad-CAM, supporting real-world deployment and potential impact in resource-limited settings. The work highlights the promise of multimodal fusion to standardize reporting, democratize access to care, and enable rapid, accurate breast cancer risk stratification and decision support.

Abstract

OncoVision is a multimodal AI pipeline that combines mammography images and clinical data for better breast cancer diagnosis. Employing an attention-based encoder-decoder backbone, it jointly segments four ROIs - masses, calcifications, axillary findings, and breast tissues - with state-of-the-art accuracy and robustly predicts ten structured clinical features: mass morphology, calcification type, ACR breast density, and BI-RADS categories. To fuse imaging and clinical insights, we developed two late-fusion strategies. By utilizing complementary multimodal data, late fusion strategies improve diagnostic precision and reduce inter-observer variability. Operationalized as a secure, user-friendly web application, OncoVision produces structured reports with dual-confidence scoring and attention-weighted visualizations for real-time diagnostic support to improve clinician trust and facilitate medical teaching. It can be easily incorporated into the clinic, making screening available in underprivileged areas around the world, such as rural South Asia. Combining accurate segmentation with clinical intuition, OncoVision raises the bar for AI-based mammography, offering a scalable and equitable solution to detect breast cancer at an earlier stage and enhancing treatment through timely interventions.

Paper Structure

This paper contains 14 sections, 19 equations, 14 figures, 8 tables.

Figures (14)

  • Figure 1: Integrated workflow and multimodal architecture of OncoVision.(a) End-to-end workflow from data acquisition and preprocessing to model training, evaluation, and web deployment. (b) Late fusion independent variant: segmentation bottleneck features feed directly into an MLP for clinical feature prediction. (c) Late fusion dependent variant: parallel processing of imaging and tabular data, followed by fusion and MLP prediction.
  • Figure 2: Comparative segmentation and diagnostic profiling using the OncoVision multimodal pipeline.(a--c) Side-by-side segmentation comparison of baseline UNet$^{++}$ and OncoVision across mass (a), calcification (b), and axilla findings with breast tissue (c). Panels show, from left to right: original mammogram, ground truth, UNet$^{++}$ prediction, OncoVision prediction, UNet$^{++}$ error map, and OncoVision error map (true positives in green, false positives in red, false negatives in blue). Zoomed insets ($3$--$4\times$) highlight improved boundary delineation and reduced false errors by OncoVision in challenging regions. (d) Clinical feature predictions by OncoVision late fusion (independent variant) for five patients, including mass morphology, axilla findings, calcification distribution, ACR breast density, and BI-RADS category. All predictions match ground truth except one BI-RADS downgrade (P2: $6 \rightarrow 5$). Data are from 345 test mammograms. Error maps show pixel-level differences. No statistical tests applied.
  • Figure 3: Comparative per-image segmentation and clinical feature confidence of UNet++ and OncoVision.(a) Box plots of per-image IoU and DSC for calcification ($n = 309$), axilla findings ($n = 369$), breast tissue ($n = 754$), and mass ($n = 328$). Boxes show interquartile range (25th--75th percentile), median (black line), mean (black diamond), and individual scores (gray dots, jittered). Red dashed/solid lines: UNet++ global IoU/DSC; blue dashed/solid lines: OncoVision. Wilcoxon signed-rank test; $^{***}P < 0.001$; $^{**}P < 0.01$; $^{*}P < 0.05$. Large effect sizes for calcification (DSC: Cohen's $d = -0.877$) and breast tissue (IoU: $d = -0.857$). (b) Box plots of model confidence across ten clinical features ($n = 754$ each). Green: dependent fusion; orange: independent fusion. Same box plot conventions. Wilcoxon signed-rank test with Bonferroni correction (adjusted $\alpha = 0.005$); $^{***}P < 0.001$; $^{**}P < 0.01$; $^{*}P < 0.05$. Independent variant shows higher confidence across all tasks.
  • Figure 4: Bland--Altman analysis of segmentation and confidence performance.(a) Bland--Altman plots comparing per-instance IoU and DSC between UNet$^{++}$ and OncoVision for calcification, axilla findings, breast tissue, and mass ($n = 309$--$754$ per class). Difference (UNet$^{++}$$-$ OncoVision) versus average; solid black line, mean bias ($-0.449$ to $0.118$ for DSC); dashed gray lines, 95% limits of agreement (mean $\pm 1.96 \times \text{SD}$, spanning up to [$-1.2, 0.3$]). Points show individual image differences. OncoVision exhibits positive bias in mass and axilla findings. (b) Bland--Altman plots comparing confidence scores between late fusion independent and dependent variants for ten clinical features ($n = 754$ each). Difference (independent $-$ dependent) versus average; mean bias $0.015$--$0.104$, largest for BI-RADS ($0.101$) and ACR breast density ($0.104$); limits of agreement within $\pm 0.45$. Points show per-instance differences. Independent variant consistently higher.
  • Figure 5: Radiomic feature analysis and BI-RADS relationships in OncoVision.(a) Heatmap of correlations among the top 20 radiomic features extracted from segmented regions and tabular inputs. Red indicates strong correlations ($r > 0.80$), orange indicates moderate correlations, and blue indicates weak correlations. (b) Bar plot showing the absolute correlations of the top 10 radiomic features with the BI-RADS category, ordered from lowest (Tissue Shape 2D Maximum Diameter) to highest (Mass Shape 2D Perimeter Surface Ratio). (c) Kernel Density Estimation (KDE) plots of five radiomic features — first-order mean, sphericity, kurtosis, total energy, and pixel surface — for axilla findings, stratified by BI-RADS categories: BI-RADS 1 (blue), BI-RADS 2 (green), BI-RADS 3 (light green), BI-RADS 4 (orange), BI-RADS 5 (light red), and BI-RADS 6 (pink).
  • ...and 9 more figures