Table of Contents
Fetching ...

Möbius Transform for Mitigating Perspective Distortions in Representation Learning

Prakash Chandra Chhipa, Meenakshi Subhash Chippa, Kanjar De, Rajkumar Saini, Marcus Liwicki, Mubarak Shah

TL;DR

This work tackles PD in real-world vision by introducing MPD, a Möbius-transform–based distortion model that synthesizes perspective-like changes without estimating camera intrinsics/extrinsics or using real distorted data, and it provides ImageNet-PD as a dedicated robustness benchmark. MPD applies a four-parameter Möbius transform $\Phi(z)=\frac{az+b}{cz+d}$ to image coordinates in the complex plane, focusing on the PD-controlling parameter $c$ (with $c_{real}$ and $c_{imag}$ steering horizontal and vertical distortion) before mapping back to the pixel grid, with an optional integrated padding variant. The method improves robustness for both supervised and self-supervised learning on ImageNet-PD and existing PD benchmarks (ImageNet-E, ImageNet-X), while preserving performance on standard data; it also generalizes to practical PD-affected tasks such as crowd counting, fisheye transfer learning, person re-identification, and object detection (via MPD-CC, MPD-AutoCrowd, VOC-360 transfer, Clip-ReIdent, and MPD-OD). The work shows MPD outperforms traditional augmentations on PD benchmarks and provides theoretical grounding via a conformality-like property, suggesting a practical path toward PD-robust representation learning with broad applicability. Overall, MPD offers a parameterized, data-efficient approach to simulate and defend against perspective distortions, potentially guiding robust CV systems in real-world deployments.

Abstract

Perspective distortion (PD) causes unprecedented changes in shape, size, orientation, angles, and other spatial relationships of visual concepts in images. Precisely estimating camera intrinsic and extrinsic parameters is a challenging task that prevents synthesizing perspective distortion. Non-availability of dedicated training data poses a critical barrier to developing robust computer vision methods. Additionally, distortion correction methods make other computer vision tasks a multi-step approach and lack performance. In this work, we propose mitigating perspective distortion (MPD) by employing a fine-grained parameter control on a specific family of Möbius transform to model real-world distortion without estimating camera intrinsic and extrinsic parameters and without the need for actual distorted data. Also, we present a dedicated perspectively distorted benchmark dataset, ImageNet-PD, to benchmark the robustness of deep learning models against this new dataset. The proposed method outperforms existing benchmarks, ImageNet-E and ImageNet-X. Additionally, it significantly improves performance on ImageNet-PD while consistently performing on standard data distribution. Notably, our method shows improved performance on three PD-affected real-world applications crowd counting, fisheye image recognition, and person re-identification and one PD-affected challenging CV task: object detection. The source code, dataset, and models are available on the project webpage at https://prakashchhipa.github.io/projects/mpd.

Möbius Transform for Mitigating Perspective Distortions in Representation Learning

TL;DR

This work tackles PD in real-world vision by introducing MPD, a Möbius-transform–based distortion model that synthesizes perspective-like changes without estimating camera intrinsics/extrinsics or using real distorted data, and it provides ImageNet-PD as a dedicated robustness benchmark. MPD applies a four-parameter Möbius transform to image coordinates in the complex plane, focusing on the PD-controlling parameter (with and steering horizontal and vertical distortion) before mapping back to the pixel grid, with an optional integrated padding variant. The method improves robustness for both supervised and self-supervised learning on ImageNet-PD and existing PD benchmarks (ImageNet-E, ImageNet-X), while preserving performance on standard data; it also generalizes to practical PD-affected tasks such as crowd counting, fisheye transfer learning, person re-identification, and object detection (via MPD-CC, MPD-AutoCrowd, VOC-360 transfer, Clip-ReIdent, and MPD-OD). The work shows MPD outperforms traditional augmentations on PD benchmarks and provides theoretical grounding via a conformality-like property, suggesting a practical path toward PD-robust representation learning with broad applicability. Overall, MPD offers a parameterized, data-efficient approach to simulate and defend against perspective distortions, potentially guiding robust CV systems in real-world deployments.

Abstract

Perspective distortion (PD) causes unprecedented changes in shape, size, orientation, angles, and other spatial relationships of visual concepts in images. Precisely estimating camera intrinsic and extrinsic parameters is a challenging task that prevents synthesizing perspective distortion. Non-availability of dedicated training data poses a critical barrier to developing robust computer vision methods. Additionally, distortion correction methods make other computer vision tasks a multi-step approach and lack performance. In this work, we propose mitigating perspective distortion (MPD) by employing a fine-grained parameter control on a specific family of Möbius transform to model real-world distortion without estimating camera intrinsic and extrinsic parameters and without the need for actual distorted data. Also, we present a dedicated perspectively distorted benchmark dataset, ImageNet-PD, to benchmark the robustness of deep learning models against this new dataset. The proposed method outperforms existing benchmarks, ImageNet-E and ImageNet-X. Additionally, it significantly improves performance on ImageNet-PD while consistently performing on standard data distribution. Notably, our method shows improved performance on three PD-affected real-world applications crowd counting, fisheye image recognition, and person re-identification and one PD-affected challenging CV task: object detection. The source code, dataset, and models are available on the project webpage at https://prakashchhipa.github.io/projects/mpd.
Paper Structure (32 sections, 13 equations, 28 figures, 23 tables)

This paper contains 32 sections, 13 equations, 28 figures, 23 tables.

Figures (28)

  • Figure 1: MPD synthesizing perspective distortion with different orientations corresponding to parameter $c: (c_{real}, c_{imag})$.
  • Figure 2: Demonstrate controlled scaling of distortion in MPD-transformed image. distortion is controlled by the intensity of the real component of complex parameter $c_{real}$ ranging from -0.1 to -0.5 to synthesize perspectively distorted left-views on example image of 'cat'.
  • Figure 3: Perspectively distorted image examples from ImageNet-PD benchmark dataset. (a) Original image, (b) Left view (PD-L), (c) Right view (PD-R), (d) Top view (PD-T), (e) Bottom view (PD-B), (f) Left view with integrated padding background (PD-LI), (g) Right view with integrated padding background (PD-RI), (h) Top view with integrated padding background (PD-TI), (i) Bottom view with integrated padding background (PD-BI).
  • Figure 4: Top1 and Top5 accuracies of ImageNet trained models with standard architectures on ImageNet-PD subsets. Blue bars shows performance on original ImageNet validation set, green bars shows mean performance on ImageNet-PD subsets with black background, and orange bars shows mean performance on ImageNet-PD subsets with integrated padding background.
  • Figure 5: Activation maps of the 'beaker' example in ImageNet-PD subsets. Standard ResNet50 model (row 1), supervised:MPD (row 2), and ssl: MPD (row 3).
  • ...and 23 more figures