Table of Contents
Fetching ...

RAD: Region-Aware Diffusion Models for Image Inpainting

Sora Kim, Sungho Suh, Minsik Lee

TL;DR

Region-aware diffusion models (RAD) reformulate diffusion for image inpainting by assigning per-pixel noise schedules, enabling asynchronous region generation and dramatically faster inference without additional conditioning modules. The method introduces four components—pixel-wise forward/reverse processes, spatially variant noise schedules, spatial noise embedding, and inverse-mapping of noise intensities—and uses Perlin-noise masks to train on diverse inpainting patterns. RAD achieves state-of-the-art qualitative and quantitative results on FFHQ, LSUN Bedroom, and ImageNet, while offering up to ~100x faster sampling and compatibility with LoRA-based fine-tuning to reduce training cost. This approach provides a simple yet effective alternative to conditioning-based or loop-heavy diffusion inpainting methods, with potential extensions to editing and conditioning tasks in the future.

Abstract

Diffusion models have achieved remarkable success in image generation, with applications broadening across various domains. Inpainting is one such application that can benefit significantly from diffusion models. Existing methods either hijack the reverse process of a pretrained diffusion model or cast the problem into a larger framework, \ie, conditioned generation. However, these approaches often require nested loops in the generation process or additional components for conditioning. In this paper, we present region-aware diffusion models (RAD) for inpainting with a simple yet effective reformulation of the vanilla diffusion models. RAD utilizes a different noise schedule for each pixel, which allows local regions to be generated asynchronously while considering the global image context. A plain reverse process requires no additional components, enabling RAD to achieve inference time up to 100 times faster than the state-of-the-art approaches. Moreover, we employ low-rank adaptation (LoRA) to fine-tune RAD based on other pretrained diffusion models, reducing computational burdens in training as well. Experiments demonstrated that RAD provides state-of-the-art results both qualitatively and quantitatively, on the FFHQ, LSUN Bedroom, and ImageNet datasets.

RAD: Region-Aware Diffusion Models for Image Inpainting

TL;DR

Region-aware diffusion models (RAD) reformulate diffusion for image inpainting by assigning per-pixel noise schedules, enabling asynchronous region generation and dramatically faster inference without additional conditioning modules. The method introduces four components—pixel-wise forward/reverse processes, spatially variant noise schedules, spatial noise embedding, and inverse-mapping of noise intensities—and uses Perlin-noise masks to train on diverse inpainting patterns. RAD achieves state-of-the-art qualitative and quantitative results on FFHQ, LSUN Bedroom, and ImageNet, while offering up to ~100x faster sampling and compatibility with LoRA-based fine-tuning to reduce training cost. This approach provides a simple yet effective alternative to conditioning-based or loop-heavy diffusion inpainting methods, with potential extensions to editing and conditioning tasks in the future.

Abstract

Diffusion models have achieved remarkable success in image generation, with applications broadening across various domains. Inpainting is one such application that can benefit significantly from diffusion models. Existing methods either hijack the reverse process of a pretrained diffusion model or cast the problem into a larger framework, \ie, conditioned generation. However, these approaches often require nested loops in the generation process or additional components for conditioning. In this paper, we present region-aware diffusion models (RAD) for inpainting with a simple yet effective reformulation of the vanilla diffusion models. RAD utilizes a different noise schedule for each pixel, which allows local regions to be generated asynchronously while considering the global image context. A plain reverse process requires no additional components, enabling RAD to achieve inference time up to 100 times faster than the state-of-the-art approaches. Moreover, we employ low-rank adaptation (LoRA) to fine-tune RAD based on other pretrained diffusion models, reducing computational burdens in training as well. Experiments demonstrated that RAD provides state-of-the-art results both qualitatively and quantitatively, on the FFHQ, LSUN Bedroom, and ImageNet datasets.

Paper Structure

This paper contains 22 sections, 7 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Region-aware diffusion models (RAD) in action.
  • Figure 2: An overview of the proposed method. RAD consists of four components: (1) the forward and reverse processes based on pixel-wise noise (\ref{['sec:RAD']}); (2) spatially variant noise schedules (\ref{['sec:noise_schedule']}); (3) spatial noise embedding (\ref{['sec:noise_encoding']}); and (4) the inverse-mapping of $\bar{b}$ (\ref{['sec:practical']}).
  • Figure 3: Examples of inpainting masks based on Perlin noise.
  • Figure 4: Qualitative comparisons. Colored areas indicate inpainting regions (1st/2nd rows: box, 3rd/4th: extreme, 5th/6th: wide).
  • Figure 5: Example results of RAD on LSUN Bedroom and FFHQ.
  • ...and 1 more figures