Table of Contents
Fetching ...

Diffusion-based image inpainting with internal learning

Nicolas Cherel, Andrés Almansa, Yann Gousseau, Alasdair Newson

TL;DR

This paper introduces lightweight diffusion-based inpainting models trained on a single image or a few images (internal learning) to overcome the high computational cost of traditional diffusion approaches. By conditioning the reverse diffusion on observed regions and masks, and using a compact UNet that predicts $x_0$, the method achieves competitive realism across textures, line drawings, and SVBRDF with dramatically reduced training and inference time. Key contributions include a detailed framework for patch- and single-image training, a 160k-parameter architecture without attention, and strong empirical results showing state-of-the-art realism in constrained modalities with far lower resource requirements. The practical impact lies in enabling fast, interactive, modality-specific inpainting when large external datasets are unavailable or impractical to use.

Abstract

Diffusion models are now the undisputed state-of-the-art for image generation and image restoration. However, they require large amounts of computational power for training and inference. In this paper, we propose lightweight diffusion models for image inpainting that can be trained on a single image, or a few images. We show that our approach competes with large state-of-the-art models in specific cases. We also show that training a model on a single image is particularly relevant for image acquisition modality that differ from the RGB images of standard learning databases. We show results in three different contexts: texture images, line drawing images, and materials BRDF, for which we achieve state-of-the-art results in terms of realism, with a computational load that is greatly reduced compared to concurrent methods.

Diffusion-based image inpainting with internal learning

TL;DR

This paper introduces lightweight diffusion-based inpainting models trained on a single image or a few images (internal learning) to overcome the high computational cost of traditional diffusion approaches. By conditioning the reverse diffusion on observed regions and masks, and using a compact UNet that predicts , the method achieves competitive realism across textures, line drawings, and SVBRDF with dramatically reduced training and inference time. Key contributions include a detailed framework for patch- and single-image training, a 160k-parameter architecture without attention, and strong empirical results showing state-of-the-art realism in constrained modalities with far lower resource requirements. The practical impact lies in enabling fast, interactive, modality-specific inpainting when large external datasets are unavailable or impractical to use.

Abstract

Diffusion models are now the undisputed state-of-the-art for image generation and image restoration. However, they require large amounts of computational power for training and inference. In this paper, we propose lightweight diffusion models for image inpainting that can be trained on a single image, or a few images. We show that our approach competes with large state-of-the-art models in specific cases. We also show that training a model on a single image is particularly relevant for image acquisition modality that differ from the RGB images of standard learning databases. We show results in three different contexts: texture images, line drawing images, and materials BRDF, for which we achieve state-of-the-art results in terms of realism, with a computational load that is greatly reduced compared to concurrent methods.
Paper Structure (12 sections, 4 equations, 4 figures, 3 tables)

This paper contains 12 sections, 4 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: UNet architecture used for our experiments. The different inputs are first concatenated before being processed by fully convolutional layers.
  • Figure 2: Inpainting results for 4 methods (see text). Results from DeepFill are the least satisfying. Results from RePaint are good but sometime lacks sharpness, produce results with wrong scales or hallucinate content (4th column). Patch and our method yield the best results in these cases.
  • Figure 3: For simple completions (top), our method performs as well as RePaint and better than the method of Newson et al. For complex completions with no obvious ground-truth, all results are good.
  • Figure 4: Inpainting results for SVBRDFs. The properties of the materials are correlated across maps, so inpainting should maintain this correlation. Our method preserves this property while non-specific methods like RePaint does not (see zoom-in with borders circled in red, where the short lines are not in coherent positions). RePaint's render looks flat. The specular map is omitted as this material has a uniform specularity which is correctly inpainted by all approaches.