Table of Contents
Fetching ...

RAW-Adapter: Adapting Pre-trained Visual Model to Camera RAW Images

Ziteng Cui, Tatsuya Harada

TL;DR

This work addresses the domain gap between RAW camera data and sRGB-pretrained vision models by introducing RAW-Adapter, a two-tier adapter framework. It combines input-level adapters that predict differentiable ISP parameters using query adaptive learning and implicit neural representations with model-level adapters that fuse ISP-stage features into the backbone, enabling effective RAW-to-RGB adaptation. Across PASCAL RAW, LOD RAW, and ADE20K RAW (including synthetic and real-world lighting variations), RAW-Adapter achieves state-of-the-art results in object detection and semantic segmentation while maintaining a compact parameter budget. The approach offers a general, efficient pathway to leverage large-scale sRGB pretraining for RAW-based CV tasks, with potential for unified RAW adaptation and multi-task decoding in future work.

Abstract

sRGB images are now the predominant choice for pre-training visual models in computer vision research, owing to their ease of acquisition and efficient storage. Meanwhile, the advantage of RAW images lies in their rich physical information under variable real-world challenging lighting conditions. For computer vision tasks directly based on camera RAW data, most existing studies adopt methods of integrating image signal processor (ISP) with backend networks, yet often overlook the interaction capabilities between the ISP stages and subsequent networks. Drawing inspiration from ongoing adapter research in NLP and CV areas, we introduce RAW-Adapter, a novel approach aimed at adapting sRGB pre-trained models to camera RAW data. RAW-Adapter comprises input-level adapters that employ learnable ISP stages to adjust RAW inputs, as well as model-level adapters to build connections between ISP stages and subsequent high-level networks. Additionally, RAW-Adapter is a general framework that could be used in various computer vision frameworks. Abundant experiments under different lighting conditions have shown our algorithm's state-of-the-art (SOTA) performance, demonstrating its effectiveness and efficiency across a range of real-world and synthetic datasets.

RAW-Adapter: Adapting Pre-trained Visual Model to Camera RAW Images

TL;DR

This work addresses the domain gap between RAW camera data and sRGB-pretrained vision models by introducing RAW-Adapter, a two-tier adapter framework. It combines input-level adapters that predict differentiable ISP parameters using query adaptive learning and implicit neural representations with model-level adapters that fuse ISP-stage features into the backbone, enabling effective RAW-to-RGB adaptation. Across PASCAL RAW, LOD RAW, and ADE20K RAW (including synthetic and real-world lighting variations), RAW-Adapter achieves state-of-the-art results in object detection and semantic segmentation while maintaining a compact parameter budget. The approach offers a general, efficient pathway to leverage large-scale sRGB pretraining for RAW-based CV tasks, with potential for unified RAW adaptation and multi-task decoding in future work.

Abstract

sRGB images are now the predominant choice for pre-training visual models in computer vision research, owing to their ease of acquisition and efficient storage. Meanwhile, the advantage of RAW images lies in their rich physical information under variable real-world challenging lighting conditions. For computer vision tasks directly based on camera RAW data, most existing studies adopt methods of integrating image signal processor (ISP) with backend networks, yet often overlook the interaction capabilities between the ISP stages and subsequent networks. Drawing inspiration from ongoing adapter research in NLP and CV areas, we introduce RAW-Adapter, a novel approach aimed at adapting sRGB pre-trained models to camera RAW data. RAW-Adapter comprises input-level adapters that employ learnable ISP stages to adjust RAW inputs, as well as model-level adapters to build connections between ISP stages and subsequent high-level networks. Additionally, RAW-Adapter is a general framework that could be used in various computer vision frameworks. Abundant experiments under different lighting conditions have shown our algorithm's state-of-the-art (SOTA) performance, demonstrating its effectiveness and efficiency across a range of real-world and synthetic datasets.
Paper Structure (25 sections, 7 equations, 9 figures, 6 tables)

This paper contains 25 sections, 7 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: (a). An overview of basic image signal processor (ISP) pipeline. (b). ISP and current visual model have different objectives. (c) Previous methods optimize ISP with down-stream visual model. (d) Our proposed RAW-Adapter.
  • Figure 2: Performance of RAW-based visual tasks with and without sRGB pre-trained weights. We analyze two methods: Dirty-Pixel steven:dirtypixels2021 and RAW-Adapter. Blue line represents trained with MS COCO COCO_dataset pre-train weights, the purple line indicates ImageNet imagenet_cvpr09 pre-train weights, and the yellow line signifies training from scratch.
  • Figure 3: (a). Structure of RAW-Adapter. Solid line in left denotes input-level adapter's workflow and dotted line denotes model-level adapter's workflow, stage 1$\sim$4 means different stage of visual model backbone. (b). Detailed structure of kernel $\&$ matrix predictors $\mathbb{P_K}$, $\mathbb{P_M}$. (c). Detailed structure of model-level adapter $\mathbb{M}$'s merge block.
  • Figure 4: Left, we use query adaptive learning (QAL) to predict key parameters is ISP process. Right, we show RAW-Adapter different blocks' parameter.
  • Figure 5: (a). Detection performance on PASCAL RAWomid2014pascalraw (normal/dark/over-exp). (b). Efficiency comparison (blue: vanilla, yellow: RAW-Adapter, gray: Dirty-Pixel steven:dirtypixels2021).
  • ...and 4 more figures