Table of Contents
Fetching ...

Removing Reflections from RAW Photos

Eric Kee, Adam Pikielny, Kevin Blackburn-Matzen, Marc Levoy

TL;DR

This work tackles real-world reflection removal in consumer photography by focusing on RAW images and introducing a photometrically and geometrically accurate synthetic data pipeline. A two-stage system, consisting of a 256^2 base model and a fast upsampler, operates on linear RAW data and optionally uses a contextual photo to disambiguate reflections, delivering full-resolution results suitable for editing. Training exclusively on simulated RAW data yields strong generalization to real images, outperforming prior methods even when those methods are retrained on RAW data; the contextual cue further improves separation quality, and the upsampling strategy minimizes artifacts compared to existing approaches. The approach enables on-device previews in seconds and provides separate transmission and reflection components to support user edits, marking a practical advance for photo editing pipelines and privacy-preserving dereflection.

Abstract

We describe a system to remove real-world reflections from images for consumer photography. Our system operates on linear (RAW) photos, and accepts an optional contextual photo looking in the opposite direction (e.g., the "selfie" camera on a mobile device). This optional photo disambiguates what should be considered the reflection. The system is trained solely on synthetic mixtures of real RAW photos, which we combine using a reflection simulation that is photometrically and geometrically accurate. Our system comprises a base model that accepts the captured photo and optional context photo as input, and runs at 256p, followed by an up-sampling model that transforms 256p images to full resolution. The system produces preview images at 1K in 4.5-6.5s on a MacBook or iPhone 14 Pro. We show SOTA results on RAW photos that were captured in the field to embody typical consumer photos, and show that training on RAW simulation data improves performance more than the architectural variations among prior works.

Removing Reflections from RAW Photos

TL;DR

This work tackles real-world reflection removal in consumer photography by focusing on RAW images and introducing a photometrically and geometrically accurate synthetic data pipeline. A two-stage system, consisting of a 256^2 base model and a fast upsampler, operates on linear RAW data and optionally uses a contextual photo to disambiguate reflections, delivering full-resolution results suitable for editing. Training exclusively on simulated RAW data yields strong generalization to real images, outperforming prior methods even when those methods are retrained on RAW data; the contextual cue further improves separation quality, and the upsampling strategy minimizes artifacts compared to existing approaches. The approach enables on-device previews in seconds and provides separate transmission and reflection components to support user edits, marking a practical advance for photo editing pipelines and privacy-preserving dereflection.

Abstract

We describe a system to remove real-world reflections from images for consumer photography. Our system operates on linear (RAW) photos, and accepts an optional contextual photo looking in the opposite direction (e.g., the "selfie" camera on a mobile device). This optional photo disambiguates what should be considered the reflection. The system is trained solely on synthetic mixtures of real RAW photos, which we combine using a reflection simulation that is photometrically and geometrically accurate. Our system comprises a base model that accepts the captured photo and optional context photo as input, and runs at 256p, followed by an up-sampling model that transforms 256p images to full resolution. The system produces preview images at 1K in 4.5-6.5s on a MacBook or iPhone 14 Pro. We show SOTA results on RAW photos that were captured in the field to embody typical consumer photos, and show that training on RAW simulation data improves performance more than the architectural variations among prior works.
Paper Structure (37 sections, 3 equations, 23 figures, 3 tables, 18 algorithms)

This paper contains 37 sections, 3 equations, 23 figures, 3 tables, 18 algorithms.

Figures (23)

  • Figure 1: Results of our reflection removal system. We use linear (RAW) images with an optional contextual photo, and output the clean and reflection images in linear color for editing, at full resolution (shown at $2\mathrm{K}$). Prior works use tone-mapped images at $\approx 256$p, yielding lower quality and inaccurate color. Brightness/contrast changes relative to captured photos arise from reflection removal, and are correct.
  • Figure 2: System
  • Figure 3: The importance of synthesizing training data (top row) from linear images (middle row), compared to prior work. (a) Photometrically accurate illuminant colors are simulated by mixing before white balancing; mixing 8-bit white balanced images is much different. (b) Mixing in scene-referred linear units produces reflections that are strong in the shadows, but transparent in the highlights. (prior work) Such effects are visibly incorrect in prior work, which blend $8$-bit tone mapped images wen2019fan2017. (bottom) Real and simulated examples are shuffled together. For each real image, a similar synthetic reflection was manually found in the dataset. Real images were not captured to match known examples; these qualitative matches exist because the dataset size exceeds $10^6$ (even numbered images are synthetic).
  • Figure 4: Results at $2048$p; base outputs inset.
  • Figure 5: Upsampling GT images $256$p to $2048$p. V-DESIRR prasad2021 adds artifacts.
  • ...and 18 more figures