CLAIRE: Scalable GPU-Accelerated Algorithms for Diffeomorphic Image Registration in 3D
Andreas Mang
TL;DR
CLAIRE advances scalable, GPU-accelerated diffeomorphic image registration by casting it as a PDE-constrained variational problem controlled by a stationary velocity field. It employs a Gauss–Newton–Krylov solver with semi-Lagrangian time integration, mixed-precision GPU kernels, and MPI for memory-distributed computation, enabling 3D brain registrations on large volumes in a few seconds on a single GPU. The framework leverages sophisticated preconditioning (regularization, two-level, and zero-velocity approximations) and parameter continuation to achieve robust convergence and diffeomorphic guarantees, demonstrated on the NIREP dataset with high Dice scores (~0.83–0.84). This work offers a practical route to real-time or near-real-time large-scale diffeomorphic registration suitable for clinical workflows and large imaging studies, while outlining limitations and directions for extending to non-stationary velocities, multi-modality similarity measures, and topology-changing scenarios.
Abstract
We present our work on scalable, GPU-accelerated algorithms for diffeomorphic image registration. The associated software package is termed CLAIRE. Image registration is a non-linear inverse problem. It is about computing a spatial mapping from one image of the same object or scene to another. In diffeomorphic image registration, the set of admissible spatial transformations is restricted to maps that are smooth, one-to-one, and have a smooth inverse. We formulate diffeomorphic image registration as a variational problem governed by transport equations. We use an inexact, globalized (Gauss--)Newton--Krylov method for numerical optimization. We consider semi-Lagrangian methods for numerical time integration. Our solver features mixed-precision, hardware-accelerated computational kernels for optimal computational throughput. We use the message-passing interface for distributed-memory parallelism and deploy our code on modern high-performance computing architectures. Our solver allows us to solve clinically relevant problems in under four seconds on a single GPU. It can also be applied to large-scale 3D imaging applications with data that is discretized on meshes with billions of voxels. We demonstrate that our numerical framework yields high-fidelity results in only a few seconds, even if we search for an optimal regularization parameter.
