Table of Contents
Fetching ...

ObjBlur: A Curriculum Learning Approach With Progressive Object-Level Blurring for Improved Layout-to-Image Generation

Stanislav Frolov, Brian B. Moser, Sebastian Palacio, Andreas Dengel

TL;DR

ObjBlur introduces a novel curriculum learning strategy for layout-to-image generation that progressively blurs objects or the background according to a per-sample schedule $s(t)$, guiding models from easy to hard visual signals without architectural changes. By modulating blur strength across training, ObjBlur stabilizes training, reduces variance across runs, and achieves state-of-the-art results on COCO-Stuff and Visual Genome across GAN and diffusion backbones. The method is simple to implement via the data loader and is compatible with existing layout-to-image models, including diffusion-based approaches, offering meaningful gains in global image fidelity (FID), object fidelity (SceneFID), and classifier-based object recognizability (CAS). The findings demonstrate the potential of curriculum learning in generative vision, enabling more reliable training and higher-quality image synthesis from structured layouts. Practical impact includes improved layout-to-image pipelines for complex scenes and a pathway to further explore curriculum-based augmentations in generative modeling.

Abstract

We present ObjBlur, a novel curriculum learning approach to improve layout-to-image generation models, where the task is to produce realistic images from layouts composed of boxes and labels. Our method is based on progressive object-level blurring, which effectively stabilizes training and enhances the quality of generated images. This curriculum learning strategy systematically applies varying degrees of blurring to individual objects or the background during training, starting from strong blurring to progressively cleaner images. Our findings reveal that this approach yields significant performance improvements, stabilized training, smoother convergence, and reduced variance between multiple runs. Moreover, our technique demonstrates its versatility by being compatible with generative adversarial networks and diffusion models, underlining its applicability across various generative modeling paradigms. With ObjBlur, we reach new state-of-the-art results on the complex COCO and Visual Genome datasets.

ObjBlur: A Curriculum Learning Approach With Progressive Object-Level Blurring for Improved Layout-to-Image Generation

TL;DR

ObjBlur introduces a novel curriculum learning strategy for layout-to-image generation that progressively blurs objects or the background according to a per-sample schedule , guiding models from easy to hard visual signals without architectural changes. By modulating blur strength across training, ObjBlur stabilizes training, reduces variance across runs, and achieves state-of-the-art results on COCO-Stuff and Visual Genome across GAN and diffusion backbones. The method is simple to implement via the data loader and is compatible with existing layout-to-image models, including diffusion-based approaches, offering meaningful gains in global image fidelity (FID), object fidelity (SceneFID), and classifier-based object recognizability (CAS). The findings demonstrate the potential of curriculum learning in generative vision, enabling more reliable training and higher-quality image synthesis from structured layouts. Practical impact includes improved layout-to-image pipelines for complex scenes and a pathway to further explore curriculum-based augmentations in generative modeling.

Abstract

We present ObjBlur, a novel curriculum learning approach to improve layout-to-image generation models, where the task is to produce realistic images from layouts composed of boxes and labels. Our method is based on progressive object-level blurring, which effectively stabilizes training and enhances the quality of generated images. This curriculum learning strategy systematically applies varying degrees of blurring to individual objects or the background during training, starting from strong blurring to progressively cleaner images. Our findings reveal that this approach yields significant performance improvements, stabilized training, smoother convergence, and reduced variance between multiple runs. Moreover, our technique demonstrates its versatility by being compatible with generative adversarial networks and diffusion models, underlining its applicability across various generative modeling paradigms. With ObjBlur, we reach new state-of-the-art results on the complex COCO and Visual Genome datasets.
Paper Structure (28 sections, 3 equations, 11 figures, 3 tables, 1 algorithm)

This paper contains 28 sections, 3 equations, 11 figures, 3 tables, 1 algorithm.

Figures (11)

  • Figure 1: Comparison of FID during training. ObjBlur stabilizes training, leading to smoother convergence with better final performance and lower standard deviation across three runs, especially at the end of training.
  • Figure 2: Our ObjBlur method incorporates a novel curriculum learning approach based on progressive object-level blurring to individual objects or the background throughout the training procedure on a per-sample basis. At each training step $t_i$, we use the blurring schedule function $\mathtt{s}(t_i)$ to compute the current blurring strength $s_i$, starting from strong blurring to progressively cleaner images. Finally, the probability $p_{\text{obj}}$ defines whether blurring should be applied to objects or the background for the current image. More details in \ref{['sec:method']}.
  • Figure 3: (a) We test different initial blurring strengths corresponding to the used start image resolution to compute $\mathbf{x}_{\textbf{LR}}$ and find that 4 and 8 perform best. (b) We analyze the object blur probability $p_{\text{obj}}$, which defines the ratio between object vs. background blurring, and find that 50% works best. Blurring objects too often negatively affects performance. (c) We study the effect of schedule duration during which we apply our blurring schedule and find that 95% of training time, corresponding to 190 out of 200 epochs, performs best.
  • Figure 4: Visual comparison of generated images with and without using ObjBlur during training. Our images are subjectively better, with more fine-grained details, better texture, more recognizable objects and higher global image coherence.
  • Figure 5:
  • ...and 6 more figures