Table of Contents
Fetching ...

CrowdSplat: Exploring Gaussian Splatting For Crowd Rendering

Xiaohan Sun, Yinghan Xu, John Dingliana, Carol O'Sullivan

TL;DR

CrowdSplat tackles the challenge of real-time rendering of large, realistic crowds by leveraging 3D Gaussian Splatting to represent animated avatars reconstructed from monocular videos. It introduces a two-stage pipeline: avatar reconstruction and crowd synthesis, with LoD and shared Gaussian parameters to reduce memory while maintaining visual fidelity. The system demonstrates 3,500 characters rendered in real time (31 FPS on RTX 4090) using 14 avatar templates and a distance-based Gaussian budget (202k near, 12k mid, 3k far). Quantitative metrics (LPIPS, PSNR) show near-field quality benefits from higher Gaussian counts, while distant views remain visually consistent, and memory usage scales with crowd size. The authors also outline future directions, including hybrid rendering with impostors, perceptual studies, and text-to-crowd generation to broaden applicability.

Abstract

We present CrowdSplat, a novel approach that leverages 3D Gaussian Splatting for real-time, high-quality crowd rendering. Our method utilizes 3D Gaussian functions to represent animated human characters in diverse poses and outfits, which are extracted from monocular videos. We integrate Level of Detail (LoD) rendering to optimize computational efficiency and quality. The CrowdSplat framework consists of two stages: (1) avatar reconstruction and (2) crowd synthesis. The framework is also optimized for GPU memory usage to enhance scalability. Quantitative and qualitative evaluations show that CrowdSplat achieves good levels of rendering quality, memory efficiency, and computational performance. Through the.se experiments, we demonstrate that CrowdSplat is a viable solution for dynamic, realistic crowd simulation in real-time applications.

CrowdSplat: Exploring Gaussian Splatting For Crowd Rendering

TL;DR

CrowdSplat tackles the challenge of real-time rendering of large, realistic crowds by leveraging 3D Gaussian Splatting to represent animated avatars reconstructed from monocular videos. It introduces a two-stage pipeline: avatar reconstruction and crowd synthesis, with LoD and shared Gaussian parameters to reduce memory while maintaining visual fidelity. The system demonstrates 3,500 characters rendered in real time (31 FPS on RTX 4090) using 14 avatar templates and a distance-based Gaussian budget (202k near, 12k mid, 3k far). Quantitative metrics (LPIPS, PSNR) show near-field quality benefits from higher Gaussian counts, while distant views remain visually consistent, and memory usage scales with crowd size. The authors also outline future directions, including hybrid rendering with impostors, perceptual studies, and text-to-crowd generation to broaden applicability.

Abstract

We present CrowdSplat, a novel approach that leverages 3D Gaussian Splatting for real-time, high-quality crowd rendering. Our method utilizes 3D Gaussian functions to represent animated human characters in diverse poses and outfits, which are extracted from monocular videos. We integrate Level of Detail (LoD) rendering to optimize computational efficiency and quality. The CrowdSplat framework consists of two stages: (1) avatar reconstruction and (2) crowd synthesis. The framework is also optimized for GPU memory usage to enhance scalability. Quantitative and qualitative evaluations show that CrowdSplat achieves good levels of rendering quality, memory efficiency, and computational performance. Through the.se experiments, we demonstrate that CrowdSplat is a viable solution for dynamic, realistic crowd simulation in real-time applications.

Paper Structure

This paper contains 3 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Screenshot of CrowdSplat running at 31 FPS on an RTX4090 with 3,500 animated characters.
  • Figure 2: Overview of CrowdSplat. The first stage combines the estimated SMPL body poses and images with a UV positional map. This process fits the 3D Gaussian attributes for each sampled point on the SMPL mesh template, reconstructing a Gaussian avatar template. The second stage uses Linear Blend Skinning (LBS) to animate multiple crowd characters, using an LoD technique for memory and rendering speed optimization. We reconstruct 14 avatar templates in the first stage and randomly duplicate these templates to 3,500 characters in the second stage.
  • Figure 3: Qualitative rendered image comparison for different resolutions. (a) Ground truth test image, (b) Rendered image with 202,738 Gaussians, (c) Rendered image with 12,661 Gaussians, (d) Rendered image with 3,176 Gaussians.
  • Figure 4: Quantitative results of rendering various numbers of Gaussians at different distances: The LPIPS metrics Zhang2018TheUE indicate that, within 5 meters, the quality of 202,738 and 12,661 Gaussians is higher than for 3,176 Gaussians, and similar at 10 meters, thus demonstrating the potential of CrowdSplat for Level of Detail (LoD) rendering.