Spatially Covariant Image Registration with Text Prompts
Xiang Chen, Min Liu, Rongguang Wang, Renjiu Hu, Dongdong Liu, Gaolei Li, Hang Zhang
TL;DR
This work addresses the efficiency and accuracy gap in deformable medical image registration by introducing textSCF, a framework that combines spatially covariant filters with text-driven anatomical prompts encoded via CLIP. By mapping anatomical-region prompts to per-voxel filter weights through a three-branch architecture (text, mask, and feature branches), the method produces region-aware deformation fields that preserve discontinuities between organs while remaining computationally efficient. Empirical results on brain MRI (OASIS) and abdominal CT demonstrate state-of-the-art Dice scores and favorable smoothness (SDlogJ), with notable gains when incorporating external segmentation and semantic text embeddings; the approach also shows transferability across regions and architectures and scales down parameters with minimal accuracy loss. The work highlights the practical impact of combining visual-language priors with spatially covariant priors to improve registration in resource-constrained clinical settings, offering a pathway toward more robust, interpretable deformable registration in multi-organ contexts.
Abstract
Medical images are often characterized by their structured anatomical representations and spatially inhomogeneous contrasts. Leveraging anatomical priors in neural networks can greatly enhance their utility in resource-constrained clinical settings. Prior research has harnessed such information for image segmentation, yet progress in deformable image registration has been modest. Our work introduces textSCF, a novel method that integrates spatially covariant filters and textual anatomical prompts encoded by visual-language models, to fill this gap. This approach optimizes an implicit function that correlates text embeddings of anatomical regions to filter weights, relaxing the typical translation-invariance constraint of convolutional operations. TextSCF not only boosts computational efficiency but can also retain or improve registration accuracy. By capturing the contextual interplay between anatomical regions, it offers impressive inter-regional transferability and the ability to preserve structural discontinuities during registration. TextSCF's performance has been rigorously tested on inter-subject brain MRI and abdominal CT registration tasks, outperforming existing state-of-the-art models in the MICCAI Learn2Reg 2021 challenge and leading the leaderboard. In abdominal registrations, textSCF's larger model variant improved the Dice score by 11.3% over the second-best model, while its smaller variant maintained similar accuracy but with an 89.13% reduction in network parameters and a 98.34\% decrease in computational operations.
