Convolution kernel adaptation to calibrated fisheye
Bruno Berenguel-Baeta, Maria Santos-Villafranca, Jesus Bermudez-Cameo, Alejandro Perez-Yus, Jose J. Guerrero
TL;DR
This work tackles the domain gap between perspective CNNs and fisheye imagery by introducing camera-calibrated deformable convolutions based on the Kannala-Brandt projection. Kernels are warped according to calibration-derived offsets, enabling receptive fields that match radial distortion and allowing transfer from perspective-trained networks with limited fine-tuning. The authors implement calibrated kernels for radially distorted fisheye cameras and demonstrate improvements in monocular depth estimation and semantic segmentation over standard convolutions, especially at larger distortion radii. This approach facilitates leveraging large perspective datasets for fisheye-enabled scene understanding without requiring massive new datasets for each camera calibration, with potential extension to other projection models.
Abstract
Convolution kernels are the basic structural component of convolutional neural networks (CNNs). In the last years there has been a growing interest in fisheye cameras for many applications. However, the radially symmetric projection model of these cameras produces high distortions that affect the performance of CNNs, especially when the field of view is very large. In this work, we tackle this problem by proposing a method that leverages the calibration of cameras to deform the convolution kernel accordingly and adapt to the distortion. That way, the receptive field of the convolution is similar to standard convolutions in perspective images, allowing us to take advantage of pre-trained networks in large perspective datasets. We show how, with just a brief fine-tuning stage in a small dataset, we improve the performance of the network for the calibrated fisheye with respect to standard convolutions in depth estimation and semantic segmentation.
