Diffusion-based Data Augmentation for Object Counting Problems

Zhen Wang; Yuelei Li; Jia Wan; Nuno Vasconcelos

Diffusion-based Data Augmentation for Object Counting Problems

Zhen Wang, Yuelei Li, Jia Wan, Nuno Vasconcelos

TL;DR

This work addresses the data scarcity challenge in dense crowd counting by proposing a diffusion-based data augmentation pipeline that conditions image generation on head-location dot maps. It introduces a smoothed density map input for ControlNet, a counting loss to enforce correspondence between dots and generated crowds, and counting-guided sampling to steer diffusion toward accurate regions. The approach demonstrates improved counting performance across ShanghaiTech, NWPU-Crowd, UCF-QNRF, and TRANCOS, and shows versatility by extending to vehicle counting. The framework is adaptable to different counting problems and offers a practical pathway to enhance generalization when labeled data is limited.

Abstract

Crowd counting is an important problem in computer vision due to its wide range of applications in image understanding. Currently, this problem is typically addressed using deep learning approaches, such as Convolutional Neural Networks (CNNs) and Transformers. However, deep networks are data-driven and are prone to overfitting, especially when the available labeled crowd dataset is limited. To overcome this limitation, we have designed a pipeline that utilizes a diffusion model to generate extensive training data. We are the first to generate images conditioned on a location dot map (a binary dot map that specifies the location of human heads) with a diffusion model. We are also the first to use these diverse synthetic data to augment the crowd counting models. Our proposed smoothed density map input for ControlNet significantly improves ControlNet's performance in generating crowds in the correct locations. Also, Our proposed counting loss for the diffusion model effectively minimizes the discrepancies between the location dot map and the crowd images generated. Additionally, our innovative guidance sampling further directs the diffusion process toward regions where the generated crowd images align most accurately with the location dot map. Collectively, we have enhanced ControlNet's ability to generate specified objects from a location dot map, which can be used for data augmentation in various counting problems. Moreover, our framework is versatile and can be easily adapted to all kinds of counting problems. Extensive experiments demonstrate that our framework improves the counting performance on the ShanghaiTech, NWPU-Crowd, UCF-QNRF, and TRANCOS datasets, showcasing its effectiveness.

Diffusion-based Data Augmentation for Object Counting Problems

TL;DR

Abstract

Paper Structure (12 sections, 15 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 12 sections, 15 equations, 7 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Crowd Counting
Data Augmentation
Diffusion Models
Method
Background
Labeling-Free Data Augmentation
Experiments
Settings
Results
Conclusion

Figures (7)

Figure 1: Generated images in various crowd density levels: Our model is able to generate crowd images from head location maps with various density levels, as illustrated in this figure. Please zoom in for details.
Figure 2: Overview of Training Pipeline: We propose a diffusion-based framework for data augmentation for the counting model. During training, we have $L_{\text{c}}$ loss to predict the gradient of log data density $\epsilon$. We also have a counting loss $L_{\text{Count}}$ to enforce the correspondence between the input dot map and the generated image.
Figure 3: Comparative Analysis of Real vs. Synthetic Crowd Images given Crowd Position Map: Samples from the NWPU dataset, with green points indicating pre-determined crowd positions. The crowd distribution in the real image and corresponding synthetic image is the same. Please Zoom in for details.
Figure 4: Generating crowd images given head location map and text prompts that control the background, with green points indicating pre-determined crowd positions. Please Zoom in for details.
Figure 5: Synthetic data ratio effect on MAE for the STEERER model on ShanghaiTech Part A.
...and 2 more figures

Diffusion-based Data Augmentation for Object Counting Problems

TL;DR

Abstract

Diffusion-based Data Augmentation for Object Counting Problems

Authors

TL;DR

Abstract

Table of Contents

Figures (7)