Table of Contents
Fetching ...

DCG-Net: Dual Cross-Attention with Concept-Value Graph Reasoning for Interpretable Medical Diagnosis

Getamesay Dagnaw, Xuefei Yin, Muhammad Hassan Maqsood, Yanming Zhu, Alan Wee-Chung Liew

Abstract

Deep learning models have achieved strong performance in medical image analysis, but their internal decision processes remain difficult to interpret. Concept Bottleneck Models (CBMs) partially address this limitation by structuring predictions through human-interpretable clinical concepts. However, existing CBMs typically overlook the contextual dependencies among concepts. To address these issues, we propose an end-to-end interpretable framework \emph{DCG-Net} that integrates multimodal alignment with structured concept reasoning. DCG-Net introduces a Dual Cross-Attention module that replaces cosine similarity matching with bidirectional attention between visual tokens and canonicalized textual concept-value prototypes, enabling spatially localized evidence attribution. To capture the relational structure inherent to clinical concepts, we develop a Parametric Concept Graph initialized with Positive Pointwise Mutual Information priors and refined through sparsity-controlled message passing. This formulation models inter-concept dependencies in a manner consistent with clinical domain knowledge. Experiments on white blood cell morphology and skin lesion diagnosis demonstrate that DCG-Net achieves state-of-the-art classification performance while producing clinically interpretable diagnostic explanations.

DCG-Net: Dual Cross-Attention with Concept-Value Graph Reasoning for Interpretable Medical Diagnosis

Abstract

Deep learning models have achieved strong performance in medical image analysis, but their internal decision processes remain difficult to interpret. Concept Bottleneck Models (CBMs) partially address this limitation by structuring predictions through human-interpretable clinical concepts. However, existing CBMs typically overlook the contextual dependencies among concepts. To address these issues, we propose an end-to-end interpretable framework \emph{DCG-Net} that integrates multimodal alignment with structured concept reasoning. DCG-Net introduces a Dual Cross-Attention module that replaces cosine similarity matching with bidirectional attention between visual tokens and canonicalized textual concept-value prototypes, enabling spatially localized evidence attribution. To capture the relational structure inherent to clinical concepts, we develop a Parametric Concept Graph initialized with Positive Pointwise Mutual Information priors and refined through sparsity-controlled message passing. This formulation models inter-concept dependencies in a manner consistent with clinical domain knowledge. Experiments on white blood cell morphology and skin lesion diagnosis demonstrate that DCG-Net achieves state-of-the-art classification performance while producing clinically interpretable diagnostic explanations.
Paper Structure (27 sections, 16 equations, 2 figures, 2 tables)

This paper contains 27 sections, 16 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Overview of DCG-Net. The CDE generates canonical concept-value prototypes ($T_M$). A vision transformer encodes images into patch-level visual tokens ($V$). The DCA module aligns visual and textual representations via bidirectional attention at the concept-value level, generating local evidence and global relevance ($\boldsymbol{\alpha}_M$), which are gated to produce the initial node features $\mathbf{H}^{(0)}$. The PCG models concept-value dependencies using PPMI-initialized edges refined through learnable message passing, resulting in the refined state $H^{(L)}$ that forms the interpretable bottleneck for diagnosis.
  • Figure 2: Qualitative explanation of DCG-Net.(A) Input image with predicted diagnosis, concept-value probabilities, and ground-truth labels. (B) Top concept contributions ($\alpha \times p_{\text{present}}$), showing which clinical findings most strongly support the model's diagnostic decision. (C) PCG-based relational reasoning, showing each top concept and its associated concept-value nodes, reflecting learned dependencies among concept-value pairs.