Möbius Transform for Mitigating Perspective Distortions in Representation Learning
Prakash Chandra Chhipa, Meenakshi Subhash Chippa, Kanjar De, Rajkumar Saini, Marcus Liwicki, Mubarak Shah
TL;DR
This work tackles PD in real-world vision by introducing MPD, a Möbius-transform–based distortion model that synthesizes perspective-like changes without estimating camera intrinsics/extrinsics or using real distorted data, and it provides ImageNet-PD as a dedicated robustness benchmark. MPD applies a four-parameter Möbius transform $\Phi(z)=\frac{az+b}{cz+d}$ to image coordinates in the complex plane, focusing on the PD-controlling parameter $c$ (with $c_{real}$ and $c_{imag}$ steering horizontal and vertical distortion) before mapping back to the pixel grid, with an optional integrated padding variant. The method improves robustness for both supervised and self-supervised learning on ImageNet-PD and existing PD benchmarks (ImageNet-E, ImageNet-X), while preserving performance on standard data; it also generalizes to practical PD-affected tasks such as crowd counting, fisheye transfer learning, person re-identification, and object detection (via MPD-CC, MPD-AutoCrowd, VOC-360 transfer, Clip-ReIdent, and MPD-OD). The work shows MPD outperforms traditional augmentations on PD benchmarks and provides theoretical grounding via a conformality-like property, suggesting a practical path toward PD-robust representation learning with broad applicability. Overall, MPD offers a parameterized, data-efficient approach to simulate and defend against perspective distortions, potentially guiding robust CV systems in real-world deployments.
Abstract
Perspective distortion (PD) causes unprecedented changes in shape, size, orientation, angles, and other spatial relationships of visual concepts in images. Precisely estimating camera intrinsic and extrinsic parameters is a challenging task that prevents synthesizing perspective distortion. Non-availability of dedicated training data poses a critical barrier to developing robust computer vision methods. Additionally, distortion correction methods make other computer vision tasks a multi-step approach and lack performance. In this work, we propose mitigating perspective distortion (MPD) by employing a fine-grained parameter control on a specific family of Möbius transform to model real-world distortion without estimating camera intrinsic and extrinsic parameters and without the need for actual distorted data. Also, we present a dedicated perspectively distorted benchmark dataset, ImageNet-PD, to benchmark the robustness of deep learning models against this new dataset. The proposed method outperforms existing benchmarks, ImageNet-E and ImageNet-X. Additionally, it significantly improves performance on ImageNet-PD while consistently performing on standard data distribution. Notably, our method shows improved performance on three PD-affected real-world applications crowd counting, fisheye image recognition, and person re-identification and one PD-affected challenging CV task: object detection. The source code, dataset, and models are available on the project webpage at https://prakashchhipa.github.io/projects/mpd.
