Table of Contents
Fetching ...

RAW-Adapter: Adapting Pre-trained Visual Model to Camera RAW Images and A Benchmark

Ziteng Cui, Jianfei Yang, Tatsuya Harada

TL;DR

This work introduces RAW-Adapter, a lightweight framework that aligns RAW camera data with sRGB-pretrained models through input-level adapters (predicting ISP-like parameters and color mappings) and model-level adapters (injecting ISP priors into backbone features). It also presents RAW-Bench, a 17-type RAW-corruption suite to evaluate in-domain and out-of-domain robustness, and a RAW-based data augmentation strategy to boost OOD generalization. Empirical results across object detection and semantic segmentation demonstrate SOTA or near-SOTA performance and improved robustness under diverse degradations, including cross-sensor transfers, with a modest parameter footprint. Together, RAW-Adapter and RAW-Bench offer a practical path toward robust, efficient RAW-domain perception suitable for real-world applications like autonomous driving and surveillance.

Abstract

In the computer vision community, the preference for pre-training visual models has largely shifted toward sRGB images due to their ease of acquisition and compact storage. However, camera RAW images preserve abundant physical details across diverse real-world scenarios. Despite this, most existing visual perception methods that utilize RAW data directly integrate image signal processing (ISP) stages with subsequent network modules, often overlooking potential synergies at the model level. Building on recent advances in adapter-based methodologies in both NLP and computer vision, we propose RAW-Adapter, a novel framework that incorporates learnable ISP modules as input-level adapters to adjust RAW inputs. At the same time, it employs model-level adapters to seamlessly bridge ISP processing with high-level downstream architectures. Moreover, RAW-Adapter serves as a general framework applicable to various computer vision frameworks. Furthermore, we introduce RAW-Bench, which incorporates 17 types of RAW-based common corruptions, including lightness degradations, weather effects, blurriness, camera imaging degradations, and variations in camera color response. Using this benchmark, we systematically compare the performance of RAW-Adapter with state-of-the-art (SOTA) ISP methods and other RAW-based high-level vision algorithms. Additionally, we propose a RAW-based data augmentation strategy to further enhance RAW-Adapter's performance and improve its out-of-domain (OOD) generalization ability. Extensive experiments substantiate the effectiveness and efficiency of RAW-Adapter, highlighting its robust performance across diverse scenarios.

RAW-Adapter: Adapting Pre-trained Visual Model to Camera RAW Images and A Benchmark

TL;DR

This work introduces RAW-Adapter, a lightweight framework that aligns RAW camera data with sRGB-pretrained models through input-level adapters (predicting ISP-like parameters and color mappings) and model-level adapters (injecting ISP priors into backbone features). It also presents RAW-Bench, a 17-type RAW-corruption suite to evaluate in-domain and out-of-domain robustness, and a RAW-based data augmentation strategy to boost OOD generalization. Empirical results across object detection and semantic segmentation demonstrate SOTA or near-SOTA performance and improved robustness under diverse degradations, including cross-sensor transfers, with a modest parameter footprint. Together, RAW-Adapter and RAW-Bench offer a practical path toward robust, efficient RAW-domain perception suitable for real-world applications like autonomous driving and surveillance.

Abstract

In the computer vision community, the preference for pre-training visual models has largely shifted toward sRGB images due to their ease of acquisition and compact storage. However, camera RAW images preserve abundant physical details across diverse real-world scenarios. Despite this, most existing visual perception methods that utilize RAW data directly integrate image signal processing (ISP) stages with subsequent network modules, often overlooking potential synergies at the model level. Building on recent advances in adapter-based methodologies in both NLP and computer vision, we propose RAW-Adapter, a novel framework that incorporates learnable ISP modules as input-level adapters to adjust RAW inputs. At the same time, it employs model-level adapters to seamlessly bridge ISP processing with high-level downstream architectures. Moreover, RAW-Adapter serves as a general framework applicable to various computer vision frameworks. Furthermore, we introduce RAW-Bench, which incorporates 17 types of RAW-based common corruptions, including lightness degradations, weather effects, blurriness, camera imaging degradations, and variations in camera color response. Using this benchmark, we systematically compare the performance of RAW-Adapter with state-of-the-art (SOTA) ISP methods and other RAW-based high-level vision algorithms. Additionally, we propose a RAW-based data augmentation strategy to further enhance RAW-Adapter's performance and improve its out-of-domain (OOD) generalization ability. Extensive experiments substantiate the effectiveness and efficiency of RAW-Adapter, highlighting its robust performance across diverse scenarios.

Paper Structure

This paper contains 39 sections, 19 equations, 17 figures, 10 tables.

Figures (17)

  • Figure 1: Performance of RAW-based object detection tasks with and without sRGB pre-trained weights. We analyze two methods: Dirty-Pixel steven:dirtypixels2021 and our RAW-Adapter. Blue line represents trained with MS COCO COCO_dataset pre-train weights, the green line indicates ImageNet imagenet_cvpr09 pre-train weights, and the red line signifies training from scratch. We separately present the detection performance on the PASCAL RAW omid2014pascalraw and LOD LOD_BMVC2021 datasets.
  • Figure 2: (a). An overview of basic image signal processor (ISP) pipeline. (The sequence or configuration may vary depending on the specific ISP settings of different manufacturers Michael_eccv16ISP_2005Tseng2022NeuralPhotoFinishing.) (b). ISP and current high-level vision models have different objectives. (c) Previous methods optimize ISP with down-stream high-level vision models. (d) Our proposed RAW-Adapter.
  • Figure 3: We propose RAW-Bench to evaluate the performance of current RAW-based vision frameworks against common corruptions, including lighting, weather, blur, camera imaging degradation, and variations in camera color response. Here, we present results on the PASCAL RAW-D dataset.
  • Figure 4: (a). Overall structure of RAW-Adapter. Solid line in left denotes input-level adapter's workflow (from input RAW data $\mathbf{I}_1$$\rightarrow$$\mathbf{I}_2$$\rightarrow$$\mathbf{I}_3$$\rightarrow$$\mathbf{I}_4$$\rightarrow$$\mathbf{I}_5$) and dotted line denotes model-level adapter's workflow, stage 1$\sim$4 means different stage of visual model backbone. (b). Detailed structure of kernel $\&$ matrix predictors $\mathbb{P_K}$, $\mathbb{P_M}$. (c). Detailed structure of model-level adapter $\mathbb{M}$'s merge block.
  • Figure 5: (a). We adopt query adaptive learning (QAL) to predict key parameters is ISP process. (b). An illustration of neural implicit 3DLUT (NILUT conde2024nilut) (c). We show RAW-Adapter different blocks' parameter.
  • ...and 12 more figures