Adapting CNNs for Fisheye Cameras without Retraining
Ryan Griffiths, Donald G. Dansereau
TL;DR
CNNs trained on perspective imagery struggle with fisheye, non-perspective cameras where rectification either crops data or fails to preserve context. Rectified Convolutions (RectConv) replace fixed kernels with per-pixel offsets derived from an invertible camera model $p=f_{3D}(u,v)$ and its inverse $u,v=f_{2D}(p)$, enabling kernels to align with local image geometry without retraining. The approach converts pre-trained networks (e.g., FCN, DeepLabV3/Plus, FCOS) to RectConv forms, achieving improved segmentation and detection on Woodscape and PIROPO datasets while maintaining full FOV coverage; it incurs interpolation-based overhead but avoids additional training data. This work enables deploying existing perspective-trained models across diverse camera geometries with minimal data, a step toward broader applicability of pre-trained vision models across camera types, with future extensions to depth/pose estimation and deconvolution layers.
Abstract
The majority of image processing approaches assume images are in or can be rectified to a perspective projection. However, in many applications it is beneficial to use non conventional cameras, such as fisheye cameras, that have a larger field of view (FOV). The issue arises that these large-FOV images can't be rectified to a perspective projection without significant cropping of the original image. To address this issue we propose Rectified Convolutions (RectConv); a new approach for adapting pre-trained convolutional networks to operate with new non-perspective images, without any retraining. Replacing the convolutional layers of the network with RectConv layers allows the network to see both rectified patches and the entire FOV. We demonstrate RectConv adapting multiple pre-trained networks to perform segmentation and detection on fisheye imagery from two publicly available datasets. Our approach requires no additional data or training, and operates directly on the native image as captured from the camera. We believe this work is a step toward adapting the vast resources available for perspective images to operate across a broad range of camera geometries.
