A Unified Formula for Affine Transformations between Calibrated Cameras
Levente Hajder
TL;DR
The paper introduces a unified closed-form expression for the local affine transformation between patches observed in two calibrated views, dependent on the relative pose $(\mathbf R,\mathbf t)$, the image coordinates, and the tangent-plane normal $\mathbf n$ with plane distance $d$. The key result, $\mathbf A = \frac{1}{s}\begin{bmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \end{bmatrix}$, expresses the affine mapping as a function of rotation, translation, plane geometry, and a scale factor $s = (\mathbf r_3^T + \frac{t_z}{d}\mathbf n^T) \mathbf p_1$, and it can be decomposed into three meaningful $2 \times 2$ components. The authors validate the formulation on the standard stereo case, showing convergence to known results when $\mathbf R=\mathbf I$ and $\mathbf t=[t_x,0,0]^T$, and discuss the interpretation of $d$ from the implicit plane equation. The work provides a general, extensible framework (a 'parent equation') for deriving additional geometric special cases, with future directions including planar motion models, pure translation/rotation, and small-baseline approximations, to support robust feature tracking and direct 3D reconstruction in calibrated setups.
Abstract
In this technical note, we derive a closed-form expression for the affine transformation mapping local image patches between two calibrated views. We show that the transformation is a function of the relative camera pose, the image coordinates, and the local surface normal.
