Table of Contents
Fetching ...

FUMO: Prior-Modulated Diffusion for Single Image Reflection Removal

Telang Xu, Chaoyang Zhang, Guangtao Zhai, Xiaohong Liu

Abstract

Single image reflection removal (SIRR) is challenging in real scenes, where reflection strength varies spatially and reflection patterns are tightly entangled with transmission structures. This paper presents a diffusion model with prior modulation framework (FUMO) that introduces explicit guidance signals to improve spatial controllability and structural faithfulness. Two priors are extracted directly from the mixed image, an intensity prior that estimates spatial reflection severity and a high-frequency prior that captures detail-sensitive responses via multi-scale residual aggregation. We propose a coarse-to-fine training paradigm. In the first stage, these cues are combined to gate the conditional residual injections, focusing the conditioning on regions that are both reflection-dominant and structure-sensitive. In the second stage, a fine-grained refinement network corrects local misalignment and sharpens fine details in the image space. Experiments conducted on both standard benchmarks and challenging images in the wild demonstrate competitive quantitative results and consistently improved perceptual quality. The code is released at https://github.com/Lucious-Desmon/FUMO.

FUMO: Prior-Modulated Diffusion for Single Image Reflection Removal

Abstract

Single image reflection removal (SIRR) is challenging in real scenes, where reflection strength varies spatially and reflection patterns are tightly entangled with transmission structures. This paper presents a diffusion model with prior modulation framework (FUMO) that introduces explicit guidance signals to improve spatial controllability and structural faithfulness. Two priors are extracted directly from the mixed image, an intensity prior that estimates spatial reflection severity and a high-frequency prior that captures detail-sensitive responses via multi-scale residual aggregation. We propose a coarse-to-fine training paradigm. In the first stage, these cues are combined to gate the conditional residual injections, focusing the conditioning on regions that are both reflection-dominant and structure-sensitive. In the second stage, a fine-grained refinement network corrects local misalignment and sharpens fine details in the image space. Experiments conducted on both standard benchmarks and challenging images in the wild demonstrate competitive quantitative results and consistently improved perceptual quality. The code is released at https://github.com/Lucious-Desmon/FUMO.
Paper Structure (25 sections, 13 equations, 7 figures, 2 tables)

This paper contains 25 sections, 13 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Failure-mode visualization on in-the-wild reflection mixtures. Three representative real-world mixed images are shown together with two priors. Qualitative comparisons with representative SOTA methods illustrate common challenges in the wild, including incomplete reflection suppression, color inconsistency, and loss of fine details. Red rectangles highlight regions for closer inspection.
  • Figure 2: The pipeline of dual prior extraction. In branch I, the mixed image $\mathbf{M}$ is divided into patches, and the intensity prior $\mathbf{P}_{\mathrm{int}}$ is finally obtained through scoring and localization. In branch II, the mixed image $\mathbf{M}$ is iteratively decomposed and aggregated to yield the high-frequency prior $\mathbf{P}_{\mathrm{hf}}$.
  • Figure 3: The framework of the proposed FUMO method. Given a mixed image $\mathbf{M}$, we obtain two priors $\mathbf{P}_{\mathrm{int}}$ and $\mathbf{P}_{\mathrm{hf}}$ through dual priors extraction. A diffusion-based backbone performs conditional denoising, where the extracted features and gates are injected through element-wise fusion operations to guide multi-scale feature aggregation. The decoder produces an coarse restoration, which is further refined by the trainable fine-grained refinement module (FGRM).
  • Figure 4: Qualitative comparisons on representative examples from the three benchmarks. The red rectangles highlight key regions for comparison.
  • Figure 5: Qualitative comparisons on challenging real-world mixed images.
  • ...and 2 more figures