UOD: Universal One-shot Detection of Anatomical Landmarks
Heqin Zhu, Quan Quan, Qingsong Yao, Zaiyi Liu, S. Kevin Zhou
TL;DR
This work tackles the challenge of robust multi-domain one-shot anatomical landmark detection by introducing Universal One-shot Detection (UOD), a two-stage framework that combines domain-specific and domain-shared modules. Stage I uses contrastive self-supervised learning to train a universal model on multi-domain data and generate pseudo landmark labels; Stage II trains a domain-adaptive transformer encoder (DATB) with a domain-adaptive convolutional decoder to suppress domain bias and produce dense heatmaps for landmarks. Key contributions include the first universal framework for multi-domain one-shot landmark detection, the domain-adaptive transformer block (DATB), and comprehensive experiments on head, hand, and chest X-ray datasets showing state-of-the-art performance with reduced labeling requirements. The approach promises practical impact by reducing labeling burden and enabling robust landmark detection across diverse anatomical regions. Code availability is provided to facilitate adoption and further research.
Abstract
One-shot medical landmark detection gains much attention and achieves great success for its label-efficient training process. However, existing one-shot learning methods are highly specialized in a single domain and suffer domain preference heavily in the situation of multi-domain unlabeled data. Moreover, one-shot learning is not robust that it faces performance drop when annotating a sub-optimal image. To tackle these issues, we resort to developing a domain-adaptive one-shot landmark detection framework for handling multi-domain medical images, named Universal One-shot Detection (UOD). UOD consists of two stages and two corresponding universal models which are designed as combinations of domain-specific modules and domain-shared modules. In the first stage, a domain-adaptive convolution model is self-supervised learned to generate pseudo landmark labels. In the second stage, we design a domain-adaptive transformer to eliminate domain preference and build the global context for multi-domain data. Even though only one annotated sample from each domain is available for training, the domain-shared modules help UOD aggregate all one-shot samples to detect more robust and accurate landmarks. We investigated both qualitatively and quantitatively the proposed UOD on three widely-used public X-ray datasets in different anatomical domains (i.e., head, hand, chest) and obtained state-of-the-art performances in each domain. The code is available at https://github.com/heqin-zhu/UOD_universal_oneshot_detection.
