Table of Contents
Fetching ...

Anatomically-guided masked autoencoder pre-training for aneurysm detection

Alberto Mario Ceballos-Arroyo, Jisoo Kim, Chu-Hsuan Lin, Lei Qin, Geoffrey S. Young, Huaizu Jiang

TL;DR

This work tackles intracranial aneurysm detection under limited annotated data by pre-training a 3D Vision Transformer with an anatomically guided masked autoencoder on 6,796 unlabeled head CT scans. It introduces artery-aware patch sampling and reconstructions of artery-distance maps, enabling the model to learn robust arterial representations before fine-tuning a DETR-like detector for lesion localization and sizing. The approach achieves 4–8 percentage point gains in lesion-level sensitivity at a fixed 0.5 FP per scan on out-of-distribution data, while matching SOTA on in-distribution data, demonstrating improved generalization and effective use of unannotated data. Overall, the fully Transformer-based pipeline with factorized 3D attention and anatomically guided MAE pre-training offers a scalable path toward clinically robust IA detection with potential for multimodal extensions.

Abstract

Intracranial aneurysms are a major cause of morbidity and mortality worldwide, and detecting them manually is a complex, time-consuming task. Albeit automated solutions are desirable, the limited availability of training data makes it difficult to develop such solutions using typical supervised learning frameworks. In this work, we propose a novel pre-training strategy using more widely available unannotated head CT scan data to pre-train a 3D Vision Transformer model prior to fine-tuning for the aneurysm detection task. Specifically, we modify masked auto-encoder (MAE) pre-training in the following ways: we use a factorized self-attention mechanism to make 3D attention computationally viable, we restrict the masked patches to areas near arteries to focus on areas where aneurysms are likely to occur, and we reconstruct not only CT scan intensity values but also artery distance maps, which describe the distance between each voxel and the closest artery, thereby enhancing the backbone's learned representations. Compared with SOTA aneurysm detection models, our approach gains +4-8% absolute Sensitivity at a false positive rate of 0.5. Code and weights will be released.

Anatomically-guided masked autoencoder pre-training for aneurysm detection

TL;DR

This work tackles intracranial aneurysm detection under limited annotated data by pre-training a 3D Vision Transformer with an anatomically guided masked autoencoder on 6,796 unlabeled head CT scans. It introduces artery-aware patch sampling and reconstructions of artery-distance maps, enabling the model to learn robust arterial representations before fine-tuning a DETR-like detector for lesion localization and sizing. The approach achieves 4–8 percentage point gains in lesion-level sensitivity at a fixed 0.5 FP per scan on out-of-distribution data, while matching SOTA on in-distribution data, demonstrating improved generalization and effective use of unannotated data. Overall, the fully Transformer-based pipeline with factorized 3D attention and anatomically guided MAE pre-training offers a scalable path toward clinically robust IA detection with potential for multimodal extensions.

Abstract

Intracranial aneurysms are a major cause of morbidity and mortality worldwide, and detecting them manually is a complex, time-consuming task. Albeit automated solutions are desirable, the limited availability of training data makes it difficult to develop such solutions using typical supervised learning frameworks. In this work, we propose a novel pre-training strategy using more widely available unannotated head CT scan data to pre-train a 3D Vision Transformer model prior to fine-tuning for the aneurysm detection task. Specifically, we modify masked auto-encoder (MAE) pre-training in the following ways: we use a factorized self-attention mechanism to make 3D attention computationally viable, we restrict the masked patches to areas near arteries to focus on areas where aneurysms are likely to occur, and we reconstruct not only CT scan intensity values but also artery distance maps, which describe the distance between each voxel and the closest artery, thereby enhancing the backbone's learned representations. Compared with SOTA aneurysm detection models, our approach gains +4-8% absolute Sensitivity at a false positive rate of 0.5. Code and weights will be released.

Paper Structure

This paper contains 16 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Lesion-level Sensitivity vs FP rate curve for our best model compared with three baselines, measured across four datasets. Our model (red) consistently achieves better Sensitivity for tolerances of 0 to 2 FPs per scan, which are essential to minimize the amount of time radiologists spend reviewing FPs.
  • Figure 2: (a) Visual depiction of our masking scheme on a single CT scan slice: lighter patches, which mostly overlap with vessels (cyan) areas, are masked; the model can only see the darker areas during pre-training. (b) Illustration of our MAE pipeline, with both the CT scan and the distance map being reconstructed.
  • Figure 3: 3D view and corresponding CTA images. Red: ground-truth aneurysm; Yellow: algorithm output; Blue: artery segmentation. Top row (all TP): Right MCA aneurysm (A, B), anterior communicating artery aneurysm (smaller) and left posterior communicating artery aneurysm (larger) (C,D), left ICA aneurysm (E, F). Bottom row: FP-basilar tip confluence (G, H), FP-posterior communicating artery infundibulum (I, J), FN-small ICA aneurysm (K, L)