H3DE-Net: Efficient and Accurate 3D Landmark Detection in Medical Imaging

Zhen Huang; Tao Tang; Ronghao Xu; Yangbo Wei; Wenkai Yang; Suhua Wang; Xiaoxin Sun; Han Li; Qingsong Yao

H3DE-Net: Efficient and Accurate 3D Landmark Detection in Medical Imaging

Zhen Huang, Tao Tang, Ronghao Xu, Yangbo Wei, Wenkai Yang, Suhua Wang, Xiaoxin Sun, Han Li, Qingsong Yao

TL;DR

The paper tackles 3D landmark detection in medical imaging, where preserving fine-grained local features while modeling global spatial context is computationally challenging. It proposes H3DE-Net, a CNN–Transformer hybrid that uses Volumetric Bi‑Routing Attention (V‑BRA) to capture global dependencies with reduced cost, complemented by a Super-Resolution Block and a Feature Fusion Module for precise localization. The work introduces two architectures—Anchor-Based and Anchor-Free—and provides detailed loss formulations for each. Experiments on a public skull CT dataset show state-of-the-art accuracy and robustness, including scenarios with missing landmarks, demonstrating potential for reliable clinical deployment. The method achieves notable improvements in mean radial error and detection rates across complete, incomplete, and all-case datasets, highlighting its practical value for 3D anatomical localization.

Abstract

3D landmark detection is a critical task in medical image analysis, and accurately detecting anatomical landmarks is essential for subsequent medical imaging tasks. However, mainstream deep learning methods in this field struggle to simultaneously capture fine-grained local features and model global spatial relationships, while maintaining a balance between accuracy and computational efficiency. Local feature extraction requires capturing fine-grained anatomical details, while global modeling requires understanding the spatial relationships within complex anatomical structures. The high-dimensional nature of 3D volume further exacerbates these challenges, as landmarks are sparsely distributed, leading to significant computational costs. Therefore, achieving efficient and precise 3D landmark detection remains a pressing challenge in medical image analysis. In this work, We propose a \textbf{H}ybrid \textbf{3}D \textbf{DE}tection \textbf{Net}(H3DE-Net), a novel framework that combines CNNs for local feature extraction with a lightweight attention mechanism designed to efficiently capture global dependencies in 3D volumetric data. This mechanism employs a hierarchical routing strategy to reduce computational cost while maintaining global context modeling. To our knowledge, H3DE-Net is the first 3D landmark detection model that integrates such a lightweight attention mechanism with CNNs. Additionally, integrating multi-scale feature fusion further enhances detection accuracy and robustness. Experimental results on a public CT dataset demonstrate that H3DE-Net achieves state-of-the-art(SOTA) performance, significantly improving accuracy and robustness, particularly in scenarios with missing landmarks or complex anatomical variations. We aready open-source our project, including code, data and model weights.

H3DE-Net: Efficient and Accurate 3D Landmark Detection in Medical Imaging

TL;DR

Abstract

H3DE-Net: Efficient and Accurate 3D Landmark Detection in Medical Imaging

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)