Table of Contents
Fetching ...

Strong and Controllable Blind Image Decomposition

Zeyu Zhang, Junlin Han, Chenhui Gou, Hongdong Li, Liang Zheng

TL;DR

This work addresses controllable blind image decomposition by enabling user-directed removal or preservation of degradations within a single framework. It introduces CBDNet, a three-block, parameter-free architecture (decomposition, controllability, recombination) built on a Restormer-based encoder–decoder, with a prompt-driven mechanism to translate user instructions into selective component retention. A new multi-domain degradation dataset with nine degradations supports both standard BID and controllable BID evaluations, and CBDNet achieves state-of-the-art or competitive results across BID tasks while maintaining high efficiency. The approach offers practical benefits for copyright protection and customizable restoration, signaling a path toward user-guided low-level vision in real-world scenarios.

Abstract

Blind image decomposition aims to decompose all components present in an image, typically used to restore a multi-degraded input image. While fully recovering the clean image is appealing, in some scenarios, users might want to retain certain degradations, such as watermarks, for copyright protection. To address this need, we add controllability to the blind image decomposition process, allowing users to enter which types of degradation to remove or retain. We design an architecture named controllable blind image decomposition network. Inserted in the middle of U-Net structure, our method first decomposes the input feature maps and then recombines them according to user instructions. Advantageously, this functionality is implemented at minimal computational cost: decomposition and recombination are all parameter-free. Experimentally, our system excels in blind image decomposition tasks and can outputs partially or fully restored images that well reflect user intentions. Furthermore, we evaluate and configure different options for the network structure and loss functions. This, combined with the proposed decomposition-and-recombination method, yields an efficient and competitive system for blind image decomposition, compared with current state-of-the-art methods.

Strong and Controllable Blind Image Decomposition

TL;DR

This work addresses controllable blind image decomposition by enabling user-directed removal or preservation of degradations within a single framework. It introduces CBDNet, a three-block, parameter-free architecture (decomposition, controllability, recombination) built on a Restormer-based encoder–decoder, with a prompt-driven mechanism to translate user instructions into selective component retention. A new multi-domain degradation dataset with nine degradations supports both standard BID and controllable BID evaluations, and CBDNet achieves state-of-the-art or competitive results across BID tasks while maintaining high efficiency. The approach offers practical benefits for copyright protection and customizable restoration, signaling a path toward user-guided low-level vision in real-world scenarios.

Abstract

Blind image decomposition aims to decompose all components present in an image, typically used to restore a multi-degraded input image. While fully recovering the clean image is appealing, in some scenarios, users might want to retain certain degradations, such as watermarks, for copyright protection. To address this need, we add controllability to the blind image decomposition process, allowing users to enter which types of degradation to remove or retain. We design an architecture named controllable blind image decomposition network. Inserted in the middle of U-Net structure, our method first decomposes the input feature maps and then recombines them according to user instructions. Advantageously, this functionality is implemented at minimal computational cost: decomposition and recombination are all parameter-free. Experimentally, our system excels in blind image decomposition tasks and can outputs partially or fully restored images that well reflect user intentions. Furthermore, we evaluate and configure different options for the network structure and loss functions. This, combined with the proposed decomposition-and-recombination method, yields an efficient and competitive system for blind image decomposition, compared with current state-of-the-art methods.
Paper Structure (29 sections, 3 equations, 17 figures, 10 tables)

This paper contains 29 sections, 3 equations, 17 figures, 10 tables.

Figures (17)

  • Figure 1: Varying user demands for image processing. Whether a component in an image is considered deteriorative or valuable content hinges on the user's intention. For a cluttered input image degraded by various components, users might want to (a) remove the interfering fence and reflection, obtaining a rainy street scene, (b) eliminate all degradations to get a clean street scene, (c) keep the watermark for the sake of copyright protection, and (d$\sim$f) extract the reflection, watermark, and fence.
  • Figure 2: Workflow of controllable blind image decomposition (controllable BID). For synthetic input images that exhibit one or multiple degradations, the controllable BID module first predicts what degradations are present. This prediction of components present aids users in making precise instruction prompts. Once these prompts are formulated, the controllable BID module then processes the input image based on these user-provided instructions. The generated images include the restored image, the degradation masks, and images with specific degradations removed.
  • Figure 3: Comparison between our proposed method CBDNet and the state-of-the-art methods on the BID task. The input images have rain streaks, raindrops, snow and haze, and a clean image is the output, which represents the most challenging task in BID. Here, Restormer* denotes the version of Restormer Zamir_Arora_Khan_Hayat_Khan_Yang_2022 that we trained on the BID dataset with a fixed resolution. Our system not only exhibits a superior image restoration effect but also has a lower number of trainable parameters.
  • Figure 4: Architecture of CBDNet. Upon receiving an input image with various degradation, the encoder transforms it into a deep feature map. This map is then split into several component feature maps by the decomposition block. Each component feature corresponds to a type of degradation. Within the controllablility block, the source classifier utilizes these component feature maps, facilitating source classification and enabling the user to give an instruction prompt easily. This prompt is subsequently converted by the prompt converter into a categorical vector indicating the image processing to be executed. In the recombination block, the chosen component feature maps are mixed with the categorical vector of the prompt and then reconstructed into a restored image by the decoder.
  • Figure 5: Visualization of feature maps output by the decomposition block. The input image is composed of corresponding degradations on the left side, and then produces the feature maps on the right side after passing through the encoder and decomposition block. The red boxes highlight the near-identical nature of feature maps for the same component across different images. In contrast, the blue boxes reveal complete dissimilarities in feature maps for different components. Furthermore, the green boxes illustrate that absent degradations in the input image lead to feature maps without significant patterns, distinct from those with identifiable degradations.
  • ...and 12 more figures