Table of Contents
Fetching ...

When ControlNet Meets Inexplicit Masks: A Case Study of ControlNet on its Contour-following Ability

Wenjie Xuan, Yufei Xu, Shanshan Zhao, Chaoyue Wang, Juhua Liu, Bo Du, Dacheng Tao

TL;DR

To enhance controllability with inexplicit masks, an advanced Shape-aware ControlNet consisting of a deterioration estimator and a shape-prior modulation block is devised.

Abstract

ControlNet excels at creating content that closely matches precise contours in user-provided masks. However, when these masks contain noise, as a frequent occurrence with non-expert users, the output would include unwanted artifacts. This paper first highlights the crucial role of controlling the impact of these inexplicit masks with diverse deterioration levels through in-depth analysis. Subsequently, to enhance controllability with inexplicit masks, an advanced Shape-aware ControlNet consisting of a deterioration estimator and a shape-prior modulation block is devised. The deterioration estimator assesses the deterioration factor of the provided masks. Then this factor is utilized in the modulation block to adaptively modulate the model's contour-following ability, which helps it dismiss the noise part in the inexplicit masks. Extensive experiments prove its effectiveness in encouraging ControlNet to interpret inaccurate spatial conditions robustly rather than blindly following the given contours, suitable for diverse kinds of conditions. We showcase application scenarios like modifying shape priors and composable shape-controllable generation. Codes are available at github.

When ControlNet Meets Inexplicit Masks: A Case Study of ControlNet on its Contour-following Ability

TL;DR

To enhance controllability with inexplicit masks, an advanced Shape-aware ControlNet consisting of a deterioration estimator and a shape-prior modulation block is devised.

Abstract

ControlNet excels at creating content that closely matches precise contours in user-provided masks. However, when these masks contain noise, as a frequent occurrence with non-expert users, the output would include unwanted artifacts. This paper first highlights the crucial role of controlling the impact of these inexplicit masks with diverse deterioration levels through in-depth analysis. Subsequently, to enhance controllability with inexplicit masks, an advanced Shape-aware ControlNet consisting of a deterioration estimator and a shape-prior modulation block is devised. The deterioration estimator assesses the deterioration factor of the provided masks. Then this factor is utilized in the modulation block to adaptively modulate the model's contour-following ability, which helps it dismiss the noise part in the inexplicit masks. Extensive experiments prove its effectiveness in encouraging ControlNet to interpret inaccurate spatial conditions robustly rather than blindly following the given contours, suitable for diverse kinds of conditions. We showcase application scenarios like modifying shape priors and composable shape-controllable generation. Codes are available at github.
Paper Structure (38 sections, 8 equations, 25 figures, 6 tables)

This paper contains 38 sections, 8 equations, 25 figures, 6 tables.

Figures (25)

  • Figure 1: ControlNet tends to preserve contours for spatial controllable generation over multi-modal control inputs, where green denotes recalled contours and blue denotes missing ones. However, inexplicit masks cause catastrophic degradation of image fidelity and realism. This paper largely enhances its robustness in interpreting inexplicit masks with inaccurate contours.
  • Figure 2: The metric curves of ControlNet-$\boldsymbol{m}_r$ on masks of varying deterioration degrees. The vanilla ControlNet, i.e., ControlNet-$\boldsymbol{m}_0$, suffers from dramatic degradation on CLIP-Score and FID on deteriorated masks, but keeps adhered to the contours of provided masks. ControlNet-$\boldsymbol{m}_r$ ($r>0$) exhibits more robust performance on deteriorated masks as the dilation radius $r$ becomes larger.
  • Figure 3: Illustration of the inductive bias of ControlNet-$\boldsymbol{m}_r$ conditioned on $\boldsymbol{m}_r$, where high CR on $\boldsymbol{m}_0$ indicates models implicitly learn the dilation radius $r$.
  • Figure 4: Metric curves of (a) CFG scale, (b) conditioning scale, and (c) condition injection strategy for the vanilla ControlNet. Red denotes the performance on the precise mask $\boldsymbol{m}_0$, and blue denotes the bounding-box mask $\boldsymbol{m}_\infty$.
  • Figure 5: The overall architecture of Shape-aware ControlNet. It contains 1) a deterioration estimator to assess the deterioration ratio of inexplicit masks, and 2) a shape-prior modulation block to modulate this ratio to ControlNet to adjust the contour-following ability for robust spatial control with inexplicit masks.
  • ...and 20 more figures

Theorems & Definitions (2)

  • definition 1: Layout Consistency, LC
  • definition 2: Semantic Retrieval, SR