MMLNB: Multi-Modal Learning for Neuroblastoma Subtyping Classification Assisted with Textual Description Generation

Huangwei Chen; Yifei Chen; Zhenyu Yan; Mingyang Ding; Chenlei Li; Zhu Zhu; Feiwei Qin

MMLNB: Multi-Modal Learning for Neuroblastoma Subtyping Classification Assisted with Textual Description Generation

Huangwei Chen, Yifei Chen, Zhenyu Yan, Mingyang Ding, Chenlei Li, Zhu Zhu, Feiwei Qin

TL;DR

NB pathology remains challenging due to heterogeneity and observer variability. The authors propose MMLNB, a two-stage multimodal framework that fine-tunes a Vision-Language Model for pathology-aware text generation and then fuses VGG16 visual features with BERT-encoded text via the PRMF mechanism for NB subtype classification. On private NBPath-7.5K and NBITP-1.5K data, MMLNB achieves state-of-the-art accuracy and AUROC, with ablations confirming the value of multi-modal fusion, LoRA-based fine-tuning, and noise-robust fusion. The approach enhances interpretability and scalability in digital pathology for NB subtyping and provides a pathway toward more reliable, AI-assisted clinical workflows.

Abstract

Neuroblastoma (NB), a leading cause of childhood cancer mortality, exhibits significant histopathological variability, necessitating precise subtyping for accurate prognosis and treatment. Traditional diagnostic methods rely on subjective evaluations that are time-consuming and inconsistent. To address these challenges, we introduce MMLNB, a multi-modal learning (MML) model that integrates pathological images with generated textual descriptions to improve classification accuracy and interpretability. The approach follows a two-stage process. First, we fine-tune a Vision-Language Model (VLM) to enhance pathology-aware text generation. Second, the fine-tuned VLM generates textual descriptions, using a dual-branch architecture to independently extract visual and textual features. These features are fused via Progressive Robust Multi-Modal Fusion (PRMF) Block for stable training. Experimental results show that the MMLNB model is more accurate than the single modal model. Ablation studies demonstrate the importance of multi-modal fusion, fine-tuning, and the PRMF mechanism. This research creates a scalable AI-driven framework for digital pathology, enhancing reliability and interpretability in NB subtyping classification. Our source code is available at https://github.com/HovChen/MMLNB.

MMLNB: Multi-Modal Learning for Neuroblastoma Subtyping Classification Assisted with Textual Description Generation

TL;DR

Abstract

MMLNB: Multi-Modal Learning for Neuroblastoma Subtyping Classification Assisted with Textual Description Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)