Adapting CNNs for Fisheye Cameras without Retraining

Ryan Griffiths; Donald G. Dansereau

Adapting CNNs for Fisheye Cameras without Retraining

Ryan Griffiths, Donald G. Dansereau

TL;DR

CNNs trained on perspective imagery struggle with fisheye, non-perspective cameras where rectification either crops data or fails to preserve context. Rectified Convolutions (RectConv) replace fixed kernels with per-pixel offsets derived from an invertible camera model $p=f_{3D}(u,v)$ and its inverse $u,v=f_{2D}(p)$, enabling kernels to align with local image geometry without retraining. The approach converts pre-trained networks (e.g., FCN, DeepLabV3/Plus, FCOS) to RectConv forms, achieving improved segmentation and detection on Woodscape and PIROPO datasets while maintaining full FOV coverage; it incurs interpolation-based overhead but avoids additional training data. This work enables deploying existing perspective-trained models across diverse camera geometries with minimal data, a step toward broader applicability of pre-trained vision models across camera types, with future extensions to depth/pose estimation and deconvolution layers.

Abstract

The majority of image processing approaches assume images are in or can be rectified to a perspective projection. However, in many applications it is beneficial to use non conventional cameras, such as fisheye cameras, that have a larger field of view (FOV). The issue arises that these large-FOV images can't be rectified to a perspective projection without significant cropping of the original image. To address this issue we propose Rectified Convolutions (RectConv); a new approach for adapting pre-trained convolutional networks to operate with new non-perspective images, without any retraining. Replacing the convolutional layers of the network with RectConv layers allows the network to see both rectified patches and the entire FOV. We demonstrate RectConv adapting multiple pre-trained networks to perform segmentation and detection on fisheye imagery from two publicly available datasets. Our approach requires no additional data or training, and operates directly on the native image as captured from the camera. We believe this work is a step toward adapting the vast resources available for perspective images to operate across a broad range of camera geometries.

Adapting CNNs for Fisheye Cameras without Retraining

TL;DR

and its inverse

, enabling kernels to align with local image geometry without retraining. The approach converts pre-trained networks (e.g., FCN, DeepLabV3/Plus, FCOS) to RectConv forms, achieving improved segmentation and detection on Woodscape and PIROPO datasets while maintaining full FOV coverage; it incurs interpolation-based overhead but avoids additional training data. This work enables deploying existing perspective-trained models across diverse camera geometries with minimal data, a step toward broader applicability of pre-trained vision models across camera types, with future extensions to depth/pose estimation and deconvolution layers.

Abstract

Paper Structure (11 sections, 5 equations, 6 figures, 4 tables)

This paper contains 11 sections, 5 equations, 6 figures, 4 tables.

Introduction
Related Work
Rectified Convolutions
RectConv Layers
Effects of Interpolation
Supported Model Architectures
Fine-Tuning
Experiments
Results
Ablation Study
Conclusions

Figures (6)

Figure 1: Example of perspective and cylindrical camera projections applied to a wide field of view fisheye image. Regions in red show areas that are excluded from the rectified projection. Decreasing the focal length can reduce cropping but increases distortion.
Figure 2: An illustration of what regular convolution and RectConv sees for a fisheye image at a given position in the image. Blue and green boxes indicate the kernel shapes for regular convolution and RectConv, respectively.
Figure 3: For a given patch each pixel is converted to 3D space which is then sampled on a regular planar grid. This grid in 3D space is converted back to image locations that represent the kernel locations for that position.
Figure 4: A histogram of the outputs from a binary classification task showing how a RectConv layers result in a bias shift in the outputs.
Figure 5: Comparison of segmentation using a FCN-Resnet101 pre-trained on Cityscape. The unmodified pre-trained network shows poor performance, pre-rectification shows poor performance and suffers from dead zones that could not be included in the rectification, and the proposed RectConv shows the strongest performance while covering the entire image.
...and 1 more figures

Adapting CNNs for Fisheye Cameras without Retraining

TL;DR

Abstract

Adapting CNNs for Fisheye Cameras without Retraining

Authors

TL;DR

Abstract

Table of Contents

Figures (6)