Table of Contents
Fetching ...

Multimodal Crowd Counting with Pix2Pix GANs

Muhammad Asif Khan, Hamid Menouar, Ridha Hamila

TL;DR

The paper tackles the data scarcity challenge in multimodal crowd counting under poor illumination by using a Pix2Pix GAN to synthesize thermal imagery from RGB inputs. It introduces MMCount, a two-branch network that fuses RGB and TIR information to produce density maps, with TIR generated either from real sensors or synthetic GANs. Evaluations on DroneRGBT, ShanghaiTech Part-B, and CARPK demonstrate that incorporating synthetic TIR improves counting accuracy over RGB-only baselines, and that generated TIR can approach the performance of real TIR data. This approach enables practical deployment of multimodal crowd counting in low-light conditions and suggests directions for lightweight real-time GANs and broader cross-scene training.

Abstract

Most state-of-the-art crowd counting methods use color (RGB) images to learn the density map of the crowd. However, these methods often struggle to achieve higher accuracy in densely crowded scenes with poor illumination. Recently, some studies have reported improvement in the accuracy of crowd counting models using a combination of RGB and thermal images. Although multimodal data can lead to better predictions, multimodal data might not be always available beforehand. In this paper, we propose the use of generative adversarial networks (GANs) to automatically generate thermal infrared (TIR) images from color (RGB) images and use both to train crowd counting models to achieve higher accuracy. We use a Pix2Pix GAN network first to translate RGB images to TIR images. Our experiments on several state-of-the-art crowd counting models and benchmark crowd datasets report significant improvement in accuracy.

Multimodal Crowd Counting with Pix2Pix GANs

TL;DR

The paper tackles the data scarcity challenge in multimodal crowd counting under poor illumination by using a Pix2Pix GAN to synthesize thermal imagery from RGB inputs. It introduces MMCount, a two-branch network that fuses RGB and TIR information to produce density maps, with TIR generated either from real sensors or synthetic GANs. Evaluations on DroneRGBT, ShanghaiTech Part-B, and CARPK demonstrate that incorporating synthetic TIR improves counting accuracy over RGB-only baselines, and that generated TIR can approach the performance of real TIR data. This approach enables practical deployment of multimodal crowd counting in low-light conditions and suggests directions for lightweight real-time GANs and broader cross-scene training.

Abstract

Most state-of-the-art crowd counting methods use color (RGB) images to learn the density map of the crowd. However, these methods often struggle to achieve higher accuracy in densely crowded scenes with poor illumination. Recently, some studies have reported improvement in the accuracy of crowd counting models using a combination of RGB and thermal images. Although multimodal data can lead to better predictions, multimodal data might not be always available beforehand. In this paper, we propose the use of generative adversarial networks (GANs) to automatically generate thermal infrared (TIR) images from color (RGB) images and use both to train crowd counting models to achieve higher accuracy. We use a Pix2Pix GAN network first to translate RGB images to TIR images. Our experiments on several state-of-the-art crowd counting models and benchmark crowd datasets report significant improvement in accuracy.
Paper Structure (19 sections, 6 equations, 3 figures, 3 tables)

This paper contains 19 sections, 6 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Illustration of counting prediction on a single sample: RGB image (top-left), corresponding TIR image (top-right), ground truth density map (bottom-left), and estimated density map (bottom-right).
  • Figure 2: The proposed method for multimodal crowd counting using RGB+TIR images. The TIR images are generated by a Pix2Pix GAN trained earlier on RGB+TIR paired datasets.
  • Figure 3: Thermal infrared (TIR) images generated using Pix2Pix GAN.