Table of Contents
Fetching ...

JOCA: Task-Driven Joint Optimisation of Camera Hardware and Adaptive Camera Control Algorithms

Chengyang Yan, Mitch Bryson, Donald G. Dansereau

TL;DR

This work addresses the problem of optimizing both camera hardware and runtime control algorithms for perception tasks. It introduces DF-Grad, a hybrid optimization that combines derivative-free GA-based supervision for dynamic hardware parameters with gradient-based learning for perception and ACC components, enabling learning under non-differentiable image formation. The approach jointly optimizes static (continuous and discrete) camera parameters and dynamic exposure/gain control, demonstrating superior performance to baselines in low-light and fast-motion scenarios on synthetic and real-world autonomous-driving data. The results highlight the feasibility and benefits of task-driven, end-to-end camera system design with a unified optimization framework.

Abstract

The quality of captured images strongly influences the performance of downstream perception tasks. Recent works on co-designing camera systems with perception tasks have shown improved task performance. However, most prior approaches focus on optimising fixed camera parameters set at manufacturing, while many parameters, such as exposure settings, require adaptive control at runtime. This paper introduces a method that jointly optimises camera hardware and adaptive camera control algorithms with downstream vision tasks. We present a unified optimisation framework that integrates gradient-based and derivative-free methods, enabling support for both continuous and discrete parameters, non-differentiable image formation processes, and neural network-based adaptive control algorithms. To address non-differentiable effects such as motion blur, we propose DF-Grad, a hybrid optimisation strategy that trains adaptive control networks using signals from a derivative-free optimiser alongside unsupervised task-driven learning. Experiments show that our method outperforms baselines that optimise static and dynamic parameters separately, particularly under challenging conditions such as low light and fast motion. These results demonstrate that jointly optimising hardware parameters and adaptive control algorithms improves perception performance and provides a unified approach to task-driven camera system design.

JOCA: Task-Driven Joint Optimisation of Camera Hardware and Adaptive Camera Control Algorithms

TL;DR

This work addresses the problem of optimizing both camera hardware and runtime control algorithms for perception tasks. It introduces DF-Grad, a hybrid optimization that combines derivative-free GA-based supervision for dynamic hardware parameters with gradient-based learning for perception and ACC components, enabling learning under non-differentiable image formation. The approach jointly optimizes static (continuous and discrete) camera parameters and dynamic exposure/gain control, demonstrating superior performance to baselines in low-light and fast-motion scenarios on synthetic and real-world autonomous-driving data. The results highlight the feasibility and benefits of task-driven, end-to-end camera system design with a unified optimization framework.

Abstract

The quality of captured images strongly influences the performance of downstream perception tasks. Recent works on co-designing camera systems with perception tasks have shown improved task performance. However, most prior approaches focus on optimising fixed camera parameters set at manufacturing, while many parameters, such as exposure settings, require adaptive control at runtime. This paper introduces a method that jointly optimises camera hardware and adaptive camera control algorithms with downstream vision tasks. We present a unified optimisation framework that integrates gradient-based and derivative-free methods, enabling support for both continuous and discrete parameters, non-differentiable image formation processes, and neural network-based adaptive control algorithms. To address non-differentiable effects such as motion blur, we propose DF-Grad, a hybrid optimisation strategy that trains adaptive control networks using signals from a derivative-free optimiser alongside unsupervised task-driven learning. Experiments show that our method outperforms baselines that optimise static and dynamic parameters separately, particularly under challenging conditions such as low light and fast motion. These results demonstrate that jointly optimising hardware parameters and adaptive control algorithms improves perception performance and provides a unified approach to task-driven camera system design.

Paper Structure

This paper contains 22 sections, 5 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: We introduce a novel end-to-end method that jointly optimises camera hardware parameters, adaptive camera control algorithms, and perception tasks to improve task performance. We combine a fitness function $F_{\text{task}}$ for a derivative-free optimiser and a loss function $l_{\text{task}}$ for a gradient-based optimiser using the proposed DF-Grad method to update neural network-based adaptive control algorithms, allowing them to learn from a non-differentiable image formation process. The method supports optimisation of static camera parameters that are continuous and discrete, as well as dynamic camera parameters, enabling task-aware and adaptive camera design.
  • Figure 2: A virtual camera with configurable parameters is used to render images ($I_{0t}$) from scenes under dynamic illumination. In forward mode (standard mode for training and inference), the render is adjusted by dynamic parameters predicted by the ACC algorithm and augmented with physically realistic noise to produce the adjusted image ($I_t$), which is then evaluated by the perception task and used to predict dynamic parameters for the next frame. During training, camera parameters $\Phi_{\text{cam}}$ are optimised by GA using a task fitness function $F_{\text{task}}$ (blue arrow), while task model parameters $\Phi_{\text{task}}$ are updated by gradient descent on the task loss $l_{\text{task}}$ (red arrow). The ACC network parameters $\Phi_{\text{ACC}}$ are trained using the proposed DF-Grad method, which combines the task loss $l_{\text{task}}$ with a GA supervision loss $l_{\text{GA}}$. The latter measures the difference between ACC predictions and GA-perturbed parameters. The combined loss is used to update $\Phi_{\text{ACC}}$ via a gradient-based method. In perturbation mode (only for perturbation optimisation), the render is adjusted with GA-perturbed parameters and evaluated in the same way as in forward mode.
  • Figure 3: Qualitative results using CARLA simulator. Comparison of the proposed method with baseline approaches that jointly optimise NeuralAE and the object detector using human-designed cameras, as well as with methods that jointly optimise camera hardware and the object detector using the non-trainable AverageAE algorithm, across different design scenarios. Our results demonstrate that the jointly optimised camera and ACC algorithm from our method consistently produces sharp images with high effective object resolution and reduced motion blur. This leads to improved task performance, particularly in detecting small and distant objects.
  • Figure 4: Adaptive camera control algorithm. We adopt the architecture from NeuralAE onzon2021neural as the main architecture for the ACC algorithm. We modify NeuralAE by using a single camera instead of two cameras, and by concatenating features from the predicted dynamic parameters at the previous step to the extracted features in the semantic feature branch for temporal consistency. Finally, we allow it to predict both exposure time and gain, rather than a single exposure value as in its original version. Figure is adapted and modified from onzon2021neural.
  • Figure 5: Visualisation of the camera FoVs and placements for our designed camera, the FLIR/Basler cameras using the nuScenes placement, and the camera designed by TaCOS with the AverageAE method.
  • ...and 3 more figures