An Analysis of Performance Bottlenecks in MRI Pre-Processing
Mathieu Dugré, Yohan Chatelain, Tristan Glatard
TL;DR
This work tackles the computational bottlenecks in MRI pre-processing pipelines used in neuroimaging by profiling popular toolboxes (ANTs, FSL, FreeSurfer) with the Intel VTune profiler across fMRIPrep sub-pipelines. The authors reveal a long-tail CPU-time distribution, with a small subset of functions consuming the majority of runtime, and identify linear interpolation as the dominant bottleneck alongside memory-access delays, quantified across a diverse healthy cohort. A notable finding is that single-precision ANTs can incur higher makespans due to an ITK-related double-precision output requirement, and FreeSurfer recon-all shows poor parallel scaling due to thread synchronization in OpenMP. The results provide a practical reference for optimization, highlight the need for careful consideration of reduced-precision techniques and OpenMP scheduling, and underscore profiling challenges on long-running HPC workflows. Overall, the study offers concrete targets for performance improvements and a methodological framework for evaluating MRI pre-processing pipelines.
Abstract
Magnetic Resonance Image (MRI) pre-processing is a critical step for neuroimaging analysis. However, the computational cost of MRI pre-processing pipelines is a major bottleneck for large cohort studies and some clinical applications. While High-Performance Computing (HPC) and, more recently, Deep Learning have been adopted to accelerate the computations, these techniques require costly hardware and are not accessible to all researchers. Therefore, it is important to understand the performance bottlenecks of MRI pre-processing pipelines to improve their performance. Using Intel VTune profiler, we characterized the bottlenecks of several commonly used MRI-preprocessing pipelines from the ANTs, FSL, and FreeSurfer toolboxes. We found that few functions contributed to most of the CPU time, and that linear interpolation was the largest contributor. Data access was also a substantial bottleneck. We identified a bug in the ITK library that impacts the performance of ANTs pipeline in single-precision and a potential issue with the OpenMP scaling in FreeSurfer recon-all. Our results provide a reference for future efforts to optimize MRI pre-processing pipelines.
