A Recursive Pyramidal Algorithm for Solving the Image Registration Problem
Stefan Dirnstorfer
TL;DR
The paper reframes image registration as estimating a displacement field $d$ that aligns two images by minimizing $\| f_1 - T(f_2, d) \|$ with $T(f,d)(x)=f(x-d(x))$ under a velocity bound $M$ and a continuity bound $\lambda$. It introduces a recursive, multi-scale scheme that first computes a coarse disparity on downsampled images and then refines it with a CNN-predicted residual, summing the two components to obtain the final $d$; recursion ends when images are too small, maintaining the slope bound. The approach is demonstrated on stereopsis with a single CNN operating on $19\times15$ windows and about $5.5\times 10^5$ parameters, trained on only 74 Middlebury pairs, achieving competitive performance on continuous regions while highlighting limitations near occlusions. The contribution emphasizes simplicity, data efficiency, and a lightweight implementation that can serve as a foundation for broader image-registration tasks and multi-sensor extensions, especially where data, time, or code constraints are critical.
Abstract
The problem of image registration is finding a transformation that aligns two images, such that the corresponding points are in the same location. This paper introduces a simple, end-to-end trainable algorithm that is implementable in a few lines of Python code. The approach is shown to work with very little training data and training time, while achieving accurate results in some settings. An example application to stereo vision was trained from 74 images on a 19x15 input window. With just a dozen lines of Python code this algorithm excels in brevity and may serve as a good start in related scenarios with limitations to training data, training time or code complexity.
