Raw or Cooked? Object Detection on RAW Images
William Ljungbergh, Joakim Johnander, Christoffer Petersson, Michael Felsberg
TL;DR
The paper challenges the assumption that camera ISP pipelines optimized for visually pleasing RGB images are optimal for deep vision tasks. It proposes a Bayer-pattern preserving downsampling stage plus three lightweight, learnable RAW processing operations ($F_\gamma$, $F_{erf}$, $F_{YJ}$) trained end-to-end with an object detector, and demonstrates improvements on the PASCALRAW dataset. Notably, Learnable Yeo-Johnson achieves the highest accuracy, $AP=52.6$, surpassing the RGB baseline by about $2.1$ AP points, while the naïve RAW RGGB input performs much worse. These results indicate that task-driven optimization of RAW-to-feature transformations can unlock robust object detection, particularly under challenging lighting conditions, with implications for camera pipelines and low-light vision systems.
Abstract
Images fed to a deep neural network have in general undergone several handcrafted image signal processing (ISP) operations, all of which have been optimized to produce visually pleasing images. In this work, we investigate the hypothesis that the intermediate representation of visually pleasing images is sub-optimal for downstream computer vision tasks compared to the RAW image representation. We suggest that the operations of the ISP instead should be optimized towards the end task, by learning the parameters of the operations jointly during training. We extend previous works on this topic and propose a new learnable operation that enables an object detector to achieve superior performance when compared to both previous works and traditional RGB images. In experiments on the open PASCALRAW dataset, we empirically confirm our hypothesis.
