Table of Contents
Fetching ...

AdaptiveISP: Learning an Adaptive Image Signal Processor for Object Detection

Yujin Wang, Tianyi Xu, Fan Zhang, Tianfan Xue, Jinwei Gu

TL;DR

This work addresses optimizing the Image Signal Processor (ISP) for downstream object detection rather than merely maximizing image quality. It introduces AdaptiveISP, which models ISP configuration as a Markov Decision Process and learns a greedy, stage-by-stage policy over differentiable ISP modules with a detector-guided reward and penalties for reuse and latency. Through an actor-critic RL setup and integration of a pre-trained detector (YOLO-v3), AdaptiveISP achieves state-of-the-art detection performance across multiple datasets while enabling real-time adaptation to scene dynamics and adjustable efficiency. The approach yields practical insights into which ISP modules most influence detection and how to balance accuracy and computational cost in dynamic environments.

Abstract

Image Signal Processors (ISPs) convert raw sensor signals into digital images, which significantly influence the image quality and the performance of downstream computer vision tasks. Designing ISP pipeline and tuning ISP parameters are two key steps for building an imaging and vision system. To find optimal ISP configurations, recent works use deep neural networks as a proxy to search for ISP parameters or ISP pipelines. However, these methods are primarily designed to maximize the image quality, which are sub-optimal in the performance of high-level computer vision tasks such as detection, recognition, and tracking. Moreover, after training, the learned ISP pipelines are mostly fixed at the inference time, whose performance degrades in dynamic scenes. To jointly optimize ISP structures and parameters, we propose AdaptiveISP, a task-driven and scene-adaptive ISP. One key observation is that for the majority of input images, only a few processing modules are needed to improve the performance of downstream recognition tasks, and only a few inputs require more processing. Based on this, AdaptiveISP utilizes deep reinforcement learning to automatically generate an optimal ISP pipeline and the associated ISP parameters to maximize the detection performance. Experimental results show that AdaptiveISP not only surpasses the prior state-of-the-art methods for object detection but also dynamically manages the trade-off between detection performance and computational cost, especially suitable for scenes with large dynamic range variations. Project website: https://openimaginglab.github.io/AdaptiveISP/.

AdaptiveISP: Learning an Adaptive Image Signal Processor for Object Detection

TL;DR

This work addresses optimizing the Image Signal Processor (ISP) for downstream object detection rather than merely maximizing image quality. It introduces AdaptiveISP, which models ISP configuration as a Markov Decision Process and learns a greedy, stage-by-stage policy over differentiable ISP modules with a detector-guided reward and penalties for reuse and latency. Through an actor-critic RL setup and integration of a pre-trained detector (YOLO-v3), AdaptiveISP achieves state-of-the-art detection performance across multiple datasets while enabling real-time adaptation to scene dynamics and adjustable efficiency. The approach yields practical insights into which ISP modules most influence detection and how to balance accuracy and computational cost in dynamic environments.

Abstract

Image Signal Processors (ISPs) convert raw sensor signals into digital images, which significantly influence the image quality and the performance of downstream computer vision tasks. Designing ISP pipeline and tuning ISP parameters are two key steps for building an imaging and vision system. To find optimal ISP configurations, recent works use deep neural networks as a proxy to search for ISP parameters or ISP pipelines. However, these methods are primarily designed to maximize the image quality, which are sub-optimal in the performance of high-level computer vision tasks such as detection, recognition, and tracking. Moreover, after training, the learned ISP pipelines are mostly fixed at the inference time, whose performance degrades in dynamic scenes. To jointly optimize ISP structures and parameters, we propose AdaptiveISP, a task-driven and scene-adaptive ISP. One key observation is that for the majority of input images, only a few processing modules are needed to improve the performance of downstream recognition tasks, and only a few inputs require more processing. Based on this, AdaptiveISP utilizes deep reinforcement learning to automatically generate an optimal ISP pipeline and the associated ISP parameters to maximize the detection performance. Experimental results show that AdaptiveISP not only surpasses the prior state-of-the-art methods for object detection but also dynamically manages the trade-off between detection performance and computational cost, especially suitable for scenes with large dynamic range variations. Project website: https://openimaginglab.github.io/AdaptiveISP/.

Paper Structure

This paper contains 23 sections, 13 equations, 12 figures, 10 tables.

Figures (12)

  • Figure 1: AdaptiveISP takes a raw image as input and automatically generates an optimal ISP pipeline $\{M_i\}$ and the associated ISP parameters $\{\Theta_i\}$ to maximize the detection performance for any given pre-trained object detection network with deep reinforcement learning. AdapativeISP achieved mAP@0.5 of 71.4 on the dataset LOD dataset, while a baseline method tseng2019hyperparameter with a fixed ISP pipeline and optimized parameters can only achieve mAP@0.5 of 70.1. Note that AdaptiveISP predicts the ISP for the image captured under normal light requires a CCM module, while the ISP for the image captured under low light requires a Desaturation module.
  • Figure 2: Overview of our method. The ISP configuration process is conceptualized as a Markov Decision Process, where a CNN-based policy network predicts the selection of ISP modules and their parameters. Concurrently, a CNN-based value network estimates the state value. The YOLO-v3 redmon2018yolov3 is employed to calculate the reward for the current policy. The entire system is optimized using the actor-critic algorithm konda1999actormnih2016asynchronous.
  • Figure 3: Object detection visualization results on LOD dataset. Our method outperforms the state-of-the-art methods tseng2019hyperparameterqin2022attentionyu2021reconfigispshi2022refactoring in terms of missed detection and false detection. The methods with fixed pipelines or fixed parameters struggle to effectively handle varying noise levels and brightness scenarios.
  • Figure 4: Image segmentation visualization results on raw COCO dataset. Our method detects all the object, while the state-of-the-art methods tseng2019hyperparameterqin2022attentionyu2021reconfigispshi2022refactoring may miss some.
  • Figure 5: The cross-validated result of different ISP pipelines and its sub-dataset on LOD datasets. Only the most matching pipeline can achieve the best results, which proves that a different pipeline is necessary.
  • ...and 7 more figures