Encoder-Only Image Registration
Xiang Chen, Renjiu Hu, Jinwei Zhang, Yuxi Zhang, Xinyao Yu, Min Liu, Yaonan Wang, Hang Zhang
TL;DR
EOIR introduces an encoder-only image registration framework that decouples feature learning from flow estimation to improve accuracy-efficiency in large-deformation scenarios. Guided by Horn–Schunck optical flow and a linearization-harmonization principle, EOIR uses a lightweight 3-layer encoder and a multi-level Hadamard-based flow estimator within a Laplacian feature pyramid, with deformation fields composed across levels to maintain diffeomorphism. The method achieves state-of-the-art efficiency-accuracy and accuracy-smoothness trade-offs across six diverse datasets, while remaining highly scalable and suitable for large-scale deployment; it also demonstrates strong zero-shot generalization and competitive performance on 2D multimodal tasks. Limitations include reduced multi-modal performance with the lightweight encoder and challenges with very small structures, suggesting future work to incorporate stronger encoders or priors. Overall, EOIR provides a robust, memory-efficient backbone for diffeomorphic registration that can scale to large volumetric datasets and resource-constrained environments.
Abstract
Learning-based techniques have significantly improved the accuracy and speed of deformable image registration. However, challenges such as reducing computational complexity and handling large deformations persist. To address these challenges, we analyze how convolutional neural networks (ConvNets) influence registration performance using the Horn-Schunck optical flow equation. Supported by prior studies and our empirical experiments, we observe that ConvNets play two key roles in registration: linearizing local intensities and harmonizing global contrast variations. Based on these insights, we propose the Encoder-Only Image Registration (EOIR) framework, designed to achieve a better accuracy-efficiency trade-off. EOIR separates feature learning from flow estimation, employing only a 3-layer ConvNet for feature extraction and a set of 3-layer flow estimators to construct a Laplacian feature pyramid, progressively composing diffeomorphic deformations under a large-deformation model. Results on five datasets across different modalities and anatomical regions demonstrate EOIR's effectiveness, achieving superior accuracy-efficiency and accuracy-smoothness trade-offs. With comparable accuracy, EOIR provides better efficiency and smoothness, and vice versa. The source code of EOIR is publicly available on https://github.com/XiangChen1994/EOIR.
