Table of Contents
Fetching ...

UOD: Universal One-shot Detection of Anatomical Landmarks

Heqin Zhu, Quan Quan, Qingsong Yao, Zaiyi Liu, S. Kevin Zhou

TL;DR

This work tackles the challenge of robust multi-domain one-shot anatomical landmark detection by introducing Universal One-shot Detection (UOD), a two-stage framework that combines domain-specific and domain-shared modules. Stage I uses contrastive self-supervised learning to train a universal model on multi-domain data and generate pseudo landmark labels; Stage II trains a domain-adaptive transformer encoder (DATB) with a domain-adaptive convolutional decoder to suppress domain bias and produce dense heatmaps for landmarks. Key contributions include the first universal framework for multi-domain one-shot landmark detection, the domain-adaptive transformer block (DATB), and comprehensive experiments on head, hand, and chest X-ray datasets showing state-of-the-art performance with reduced labeling requirements. The approach promises practical impact by reducing labeling burden and enabling robust landmark detection across diverse anatomical regions. Code availability is provided to facilitate adoption and further research.

Abstract

One-shot medical landmark detection gains much attention and achieves great success for its label-efficient training process. However, existing one-shot learning methods are highly specialized in a single domain and suffer domain preference heavily in the situation of multi-domain unlabeled data. Moreover, one-shot learning is not robust that it faces performance drop when annotating a sub-optimal image. To tackle these issues, we resort to developing a domain-adaptive one-shot landmark detection framework for handling multi-domain medical images, named Universal One-shot Detection (UOD). UOD consists of two stages and two corresponding universal models which are designed as combinations of domain-specific modules and domain-shared modules. In the first stage, a domain-adaptive convolution model is self-supervised learned to generate pseudo landmark labels. In the second stage, we design a domain-adaptive transformer to eliminate domain preference and build the global context for multi-domain data. Even though only one annotated sample from each domain is available for training, the domain-shared modules help UOD aggregate all one-shot samples to detect more robust and accurate landmarks. We investigated both qualitatively and quantitatively the proposed UOD on three widely-used public X-ray datasets in different anatomical domains (i.e., head, hand, chest) and obtained state-of-the-art performances in each domain. The code is available at https://github.com/heqin-zhu/UOD_universal_oneshot_detection.

UOD: Universal One-shot Detection of Anatomical Landmarks

TL;DR

This work tackles the challenge of robust multi-domain one-shot anatomical landmark detection by introducing Universal One-shot Detection (UOD), a two-stage framework that combines domain-specific and domain-shared modules. Stage I uses contrastive self-supervised learning to train a universal model on multi-domain data and generate pseudo landmark labels; Stage II trains a domain-adaptive transformer encoder (DATB) with a domain-adaptive convolutional decoder to suppress domain bias and produce dense heatmaps for landmarks. Key contributions include the first universal framework for multi-domain one-shot landmark detection, the domain-adaptive transformer block (DATB), and comprehensive experiments on head, hand, and chest X-ray datasets showing state-of-the-art performance with reduced labeling requirements. The approach promises practical impact by reducing labeling burden and enabling robust landmark detection across diverse anatomical regions. Code availability is provided to facilitate adoption and further research.

Abstract

One-shot medical landmark detection gains much attention and achieves great success for its label-efficient training process. However, existing one-shot learning methods are highly specialized in a single domain and suffer domain preference heavily in the situation of multi-domain unlabeled data. Moreover, one-shot learning is not robust that it faces performance drop when annotating a sub-optimal image. To tackle these issues, we resort to developing a domain-adaptive one-shot landmark detection framework for handling multi-domain medical images, named Universal One-shot Detection (UOD). UOD consists of two stages and two corresponding universal models which are designed as combinations of domain-specific modules and domain-shared modules. In the first stage, a domain-adaptive convolution model is self-supervised learned to generate pseudo landmark labels. In the second stage, we design a domain-adaptive transformer to eliminate domain preference and build the global context for multi-domain data. Even though only one annotated sample from each domain is available for training, the domain-shared modules help UOD aggregate all one-shot samples to detect more robust and accurate landmarks. We investigated both qualitatively and quantitatively the proposed UOD on three widely-used public X-ray datasets in different anatomical domains (i.e., head, hand, chest) and obtained state-of-the-art performances in each domain. The code is available at https://github.com/heqin-zhu/UOD_universal_oneshot_detection.
Paper Structure (9 sections, 2 equations, 4 figures, 2 tables)

This paper contains 9 sections, 2 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of UOD framework. In stage I, two universal models are learned via contrastive learning for matching similar patches from original image and augmented one-shot sample image and generating pseudo labels. In stage II, DATR is designed to better capture global context information among all domains for detecting more accurate landmarks.
  • Figure 2: (a) The architecture of DATR in stage II, which is composed of domain-adaptive transformer encoder and convolution adaptors ref_u2net. (b) Basic transformer block. (c) Domain-adaptive transformer block. Each domain-adaptive transformer is a basic transformer block with query matrix duplicated and domain-adaptive diagonal for each domain. The batch-normalization, activation, and patch merging are omitted.
  • Figure 3: Comparison of single model and universal model on head dataset.
  • Figure 4: Qualitative comparison of UOD and CC2D yao2021one on head, hand, and chest datasets. The red points $\bullet$ indicate predicted landmarks while the green points $\bullet$ indicate ground truth landmarks. The MRE value is displayed in the top left corner of the image.