Table of Contents
Fetching ...

A Retrospective Systematic Study on Hierarchical Sparse Query Transformer-assisted Ultrasound Screening for Early Hepatocellular Carcinoma

Chaoyin She, Ruifang Lu, Danni He, Jiayi Lv, Yadan Lin, Meiqing Cheng, Hui Huang, Fengyu Ye, Lida Chen, Wei Wang, Qinghua Huang

TL;DR

This work addresses the limited sensitivity and variability in ultrasound screening for hepatocellular carcinoma by introducing the Hierarchical Sparse Query Transformer (HSQformer), a CNN-ViT hybrid that leverages latent-space representations and sparse Mixture-of-Experts to fuse local and global features without structural redundancy. The model processes multi-scale features from ConvNeXt and SwinTransformer, projects them into a latent space, and uses a four-stage HSQformer backbone with Cross-Self-attention Mixed Experts to achieve hierarchical, sparse feature integration. Across single-center, multi-center, and high-risk patient cohorts, HSQformer demonstrates state-of-the-art performance, matching senior radiologists and outperforming juniors, with notable AUC gains (e.g., 95.38% in multi-center testing). The approach offers robust generalization, potential standardization of HCC screening, and an open-source codebase to foster broader adoption and further research in AI-assisted ultrasound diagnostics.

Abstract

Hepatocellular carcinoma (HCC), ranking as the third leading cause of cancer-related mortality worldwide, demands urgent improvements in early detection to enhance patient survival. While ultrasound remains the preferred screening modality due to its cost-effectiveness and real-time capabilities, its sensitivity (59%-78%) heavily relies on radiologists' expertise, leading to inconsistent diagnostic outcomes and operational inefficiencies. Recent advancements in AI technology offer promising solutions to bridge this gap. This study introduces the Hierarchical Sparse Query Transformer (HSQformer), a novel hybrid architecture that synergizes CNNs' local feature extraction with Vision Transformers' global contextual awareness through latent space representation and sparse learning. By dynamically activating task-specific experts via a Mixture-of-Experts (MoE) framework, HSQformer achieves hierarchical feature integration without structural redundancy. Evaluated across three clinical scenarios: single-center, multi-center, and high-risk patient cohorts, HSQformer outperforms state-of-the-art models (e.g., 95.38% AUC in multi-center testing) and matches senior radiologists' diagnostic accuracy while significantly surpassing junior counterparts. These results highlight the potential of AI-assisted tools to standardize HCC screening, reduce dependency on human expertise, and improve early diagnosis rates. The full code is available at https://github.com/Asunatan/HSQformer.

A Retrospective Systematic Study on Hierarchical Sparse Query Transformer-assisted Ultrasound Screening for Early Hepatocellular Carcinoma

TL;DR

This work addresses the limited sensitivity and variability in ultrasound screening for hepatocellular carcinoma by introducing the Hierarchical Sparse Query Transformer (HSQformer), a CNN-ViT hybrid that leverages latent-space representations and sparse Mixture-of-Experts to fuse local and global features without structural redundancy. The model processes multi-scale features from ConvNeXt and SwinTransformer, projects them into a latent space, and uses a four-stage HSQformer backbone with Cross-Self-attention Mixed Experts to achieve hierarchical, sparse feature integration. Across single-center, multi-center, and high-risk patient cohorts, HSQformer demonstrates state-of-the-art performance, matching senior radiologists and outperforming juniors, with notable AUC gains (e.g., 95.38% in multi-center testing). The approach offers robust generalization, potential standardization of HCC screening, and an open-source codebase to foster broader adoption and further research in AI-assisted ultrasound diagnostics.

Abstract

Hepatocellular carcinoma (HCC), ranking as the third leading cause of cancer-related mortality worldwide, demands urgent improvements in early detection to enhance patient survival. While ultrasound remains the preferred screening modality due to its cost-effectiveness and real-time capabilities, its sensitivity (59%-78%) heavily relies on radiologists' expertise, leading to inconsistent diagnostic outcomes and operational inefficiencies. Recent advancements in AI technology offer promising solutions to bridge this gap. This study introduces the Hierarchical Sparse Query Transformer (HSQformer), a novel hybrid architecture that synergizes CNNs' local feature extraction with Vision Transformers' global contextual awareness through latent space representation and sparse learning. By dynamically activating task-specific experts via a Mixture-of-Experts (MoE) framework, HSQformer achieves hierarchical feature integration without structural redundancy. Evaluated across three clinical scenarios: single-center, multi-center, and high-risk patient cohorts, HSQformer outperforms state-of-the-art models (e.g., 95.38% AUC in multi-center testing) and matches senior radiologists' diagnostic accuracy while significantly surpassing junior counterparts. These results highlight the potential of AI-assisted tools to standardize HCC screening, reduce dependency on human expertise, and improve early diagnosis rates. The full code is available at https://github.com/Asunatan/HSQformer.

Paper Structure

This paper contains 33 sections, 7 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: The improvement brought by simply combining CNN and ViT.
  • Figure 2: model architecture. Overview of the HSQformer, integrating CNN and ViT features through a hierarchical sparse querying framework for efficient diagnosis.
  • Figure 3: Visualization of Human-Machine Diagnostic Efficacy Comparison.
  • Figure 4: Assessing the Impact of Stage Schemes on Model Performance.
  • Figure 5: Influence of proportional parameters on model performance at different stages.
  • ...and 5 more figures