Table of Contents
Fetching ...

DRIFT: Deep Restoration, ISP Fusion, and Tone-mapping

Soumendu Majee, Joshua Peter Ebenezer, Abhinau K. Venkataramanan, Weidi Liu, Thilo Balke, Zeeshan Nadir, Sreenithy Chandran, Seok-Jun Lee, Hamid Rahim Sheikh

Abstract

Smartphone cameras have gained immense popularity with the adoption of high-resolution and high-dynamic range imaging. As a result, high-performance camera Image Signal Processors (ISPs) are crucial in generating high-quality images for the end user while keeping computational costs low. In this paper, we propose DRIFT (Deep Restoration, ISP Fusion, and Tone-mapping): an efficient AI mobile camera pipeline that generates high quality RGB images from hand-held raw captures. The first stage of DRIFT is a Multi-Frame Processing (MFP) network that is trained using a adversarial perceptual loss to perform multi-frame alignment, denoising, demosaicing, and super-resolution. Then, the output of DRIFT-MFP is processed by a novel deep-learning based tone-mapping (DRIFT-TM) solution that allows for tone tunability, ensures tone-consistency with a reference pipeline, and can be run efficiently for high-resolution images on a mobile device. We show qualitative and quantitative comparisons against state-of-the-art MFP and tone-mapping methods to demonstrate the effectiveness of our approach.

DRIFT: Deep Restoration, ISP Fusion, and Tone-mapping

Abstract

Smartphone cameras have gained immense popularity with the adoption of high-resolution and high-dynamic range imaging. As a result, high-performance camera Image Signal Processors (ISPs) are crucial in generating high-quality images for the end user while keeping computational costs low. In this paper, we propose DRIFT (Deep Restoration, ISP Fusion, and Tone-mapping): an efficient AI mobile camera pipeline that generates high quality RGB images from hand-held raw captures. The first stage of DRIFT is a Multi-Frame Processing (MFP) network that is trained using a adversarial perceptual loss to perform multi-frame alignment, denoising, demosaicing, and super-resolution. Then, the output of DRIFT-MFP is processed by a novel deep-learning based tone-mapping (DRIFT-TM) solution that allows for tone tunability, ensures tone-consistency with a reference pipeline, and can be run efficiently for high-resolution images on a mobile device. We show qualitative and quantitative comparisons against state-of-the-art MFP and tone-mapping methods to demonstrate the effectiveness of our approach.

Paper Structure

This paper contains 26 sections, 9 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Overview of the proposed Drift Pipeline. In the first part of DRIFT, DRIFT-MFP performs deep restoration of the the multi-frame raw data comprising of regular (EV0) and short (EV-) exposures and outputs single RGB restored frames for each of the two exposures. The Fusion ISP then fuses the exposures together to form a single frame HDR RGB image. Finally, DRIFT-TM performs efficient tone-mapping on the HDR RGB image to produce the final sRGB tone-mapped image. Our method allows for passing tuning inputs during inference to adjust the appearance of the final output.
  • Figure 2: Overview of the training and inference pipelines for DRIFT-MFP. During training, we model handshake motions using homographies from real data and apply them to tripod captures of corresponding short exposure and long exposure images with equalized brightness.
  • Figure 3: Overview of the training and inference pipelines for DRIFT-TM. We train DRIFT-TM using a computationally expensive referenece tone-map. DRIFT-TM learns the residual enhancements from a light-weight tonemap allowing more robust learning as well as tunability by modulating the enhancements.
  • Figure 4: DRIFT Tone-map network architecture. We incorporate a local encoder that encodes a full-resolution image per tile, a global encoder that encodes a low-resolution full image, and a metadata encoder to encode capture metadata.
  • Figure 5: Denoising results across various scenes. For each scene (row), the columns correspond to: (a) The low-quality input image, (b) BIPNet, (c) MPRNet (d)Burstormer, (e) Restormer (f) NAFNet (g) Our proposed method DRIFT-MFP, and (h) The Ground Truth (GT). Our method consistently outperforms the baselines in terms of visual quality and fidelity to the GT.
  • ...and 5 more figures