Anatomically-guided masked autoencoder pre-training for aneurysm detection
Alberto Mario Ceballos-Arroyo, Jisoo Kim, Chu-Hsuan Lin, Lei Qin, Geoffrey S. Young, Huaizu Jiang
TL;DR
This work tackles intracranial aneurysm detection under limited annotated data by pre-training a 3D Vision Transformer with an anatomically guided masked autoencoder on 6,796 unlabeled head CT scans. It introduces artery-aware patch sampling and reconstructions of artery-distance maps, enabling the model to learn robust arterial representations before fine-tuning a DETR-like detector for lesion localization and sizing. The approach achieves 4–8 percentage point gains in lesion-level sensitivity at a fixed 0.5 FP per scan on out-of-distribution data, while matching SOTA on in-distribution data, demonstrating improved generalization and effective use of unannotated data. Overall, the fully Transformer-based pipeline with factorized 3D attention and anatomically guided MAE pre-training offers a scalable path toward clinically robust IA detection with potential for multimodal extensions.
Abstract
Intracranial aneurysms are a major cause of morbidity and mortality worldwide, and detecting them manually is a complex, time-consuming task. Albeit automated solutions are desirable, the limited availability of training data makes it difficult to develop such solutions using typical supervised learning frameworks. In this work, we propose a novel pre-training strategy using more widely available unannotated head CT scan data to pre-train a 3D Vision Transformer model prior to fine-tuning for the aneurysm detection task. Specifically, we modify masked auto-encoder (MAE) pre-training in the following ways: we use a factorized self-attention mechanism to make 3D attention computationally viable, we restrict the masked patches to areas near arteries to focus on areas where aneurysms are likely to occur, and we reconstruct not only CT scan intensity values but also artery distance maps, which describe the distance between each voxel and the closest artery, thereby enhancing the backbone's learned representations. Compared with SOTA aneurysm detection models, our approach gains +4-8% absolute Sensitivity at a false positive rate of 0.5. Code and weights will be released.
