RAW-Adapter: Adapting Pre-trained Visual Model to Camera RAW Images
Ziteng Cui, Tatsuya Harada
TL;DR
This work addresses the domain gap between RAW camera data and sRGB-pretrained vision models by introducing RAW-Adapter, a two-tier adapter framework. It combines input-level adapters that predict differentiable ISP parameters using query adaptive learning and implicit neural representations with model-level adapters that fuse ISP-stage features into the backbone, enabling effective RAW-to-RGB adaptation. Across PASCAL RAW, LOD RAW, and ADE20K RAW (including synthetic and real-world lighting variations), RAW-Adapter achieves state-of-the-art results in object detection and semantic segmentation while maintaining a compact parameter budget. The approach offers a general, efficient pathway to leverage large-scale sRGB pretraining for RAW-based CV tasks, with potential for unified RAW adaptation and multi-task decoding in future work.
Abstract
sRGB images are now the predominant choice for pre-training visual models in computer vision research, owing to their ease of acquisition and efficient storage. Meanwhile, the advantage of RAW images lies in their rich physical information under variable real-world challenging lighting conditions. For computer vision tasks directly based on camera RAW data, most existing studies adopt methods of integrating image signal processor (ISP) with backend networks, yet often overlook the interaction capabilities between the ISP stages and subsequent networks. Drawing inspiration from ongoing adapter research in NLP and CV areas, we introduce RAW-Adapter, a novel approach aimed at adapting sRGB pre-trained models to camera RAW data. RAW-Adapter comprises input-level adapters that employ learnable ISP stages to adjust RAW inputs, as well as model-level adapters to build connections between ISP stages and subsequent high-level networks. Additionally, RAW-Adapter is a general framework that could be used in various computer vision frameworks. Abundant experiments under different lighting conditions have shown our algorithm's state-of-the-art (SOTA) performance, demonstrating its effectiveness and efficiency across a range of real-world and synthetic datasets.
