Table of Contents
Fetching ...

Phys-3D: Physics-Constrained Real-Time Crowd Tracking and Counting on Railway Platforms

Bin Zeng, Johannes Künzel, Anna Hilsmann, Peter Eisert

TL;DR

The results show that incorporating first-principles geometry and motion priors enables reliable crowd counting in safety-critical transportation scenarios, facilitating effective train scheduling and platform safety management.

Abstract

Accurate, real-time crowd counting on railway platforms is essential for safety and capacity management. We propose to use a single camera mounted in a train, scanning the platform while arriving. While hardware constraints are simple, counting remains challenging due to dense occlusions, camera motion, and perspective distortions during train arrivals. Most existing tracking-by-detection approaches assume static cameras or ignore physical consistency in motion modeling, leading to unreliable counting under dynamic conditions. We propose a physics-constrained tracking framework that unifies detection, appearance, and 3D motion reasoning in a real-time pipeline. Our approach integrates a transfer-learned YOLOv11m detector with EfficientNet-B0 appearance encoding within DeepSORT, while introducing a physics-constrained Kalman model (Phys-3D) that enforces physically plausible 3D motion dynamics through pinhole geometry. To address counting brittleness under occlusions, we implement a virtual counting band with persistence. On our platform benchmark, MOT-RailwayPlatformCrowdHead Dataset(MOT-RPCH), our method reduces counting error to 2.97%, demonstrating robust performance despite motion and occlusions. Our results show that incorporating first-principles geometry and motion priors enables reliable crowd counting in safety-critical transportation scenarios, facilitating effective train scheduling and platform safety management.

Phys-3D: Physics-Constrained Real-Time Crowd Tracking and Counting on Railway Platforms

TL;DR

The results show that incorporating first-principles geometry and motion priors enables reliable crowd counting in safety-critical transportation scenarios, facilitating effective train scheduling and platform safety management.

Abstract

Accurate, real-time crowd counting on railway platforms is essential for safety and capacity management. We propose to use a single camera mounted in a train, scanning the platform while arriving. While hardware constraints are simple, counting remains challenging due to dense occlusions, camera motion, and perspective distortions during train arrivals. Most existing tracking-by-detection approaches assume static cameras or ignore physical consistency in motion modeling, leading to unreliable counting under dynamic conditions. We propose a physics-constrained tracking framework that unifies detection, appearance, and 3D motion reasoning in a real-time pipeline. Our approach integrates a transfer-learned YOLOv11m detector with EfficientNet-B0 appearance encoding within DeepSORT, while introducing a physics-constrained Kalman model (Phys-3D) that enforces physically plausible 3D motion dynamics through pinhole geometry. To address counting brittleness under occlusions, we implement a virtual counting band with persistence. On our platform benchmark, MOT-RailwayPlatformCrowdHead Dataset(MOT-RPCH), our method reduces counting error to 2.97%, demonstrating robust performance despite motion and occlusions. Our results show that incorporating first-principles geometry and motion priors enables reliable crowd counting in safety-critical transportation scenarios, facilitating effective train scheduling and platform safety management.
Paper Structure (17 sections, 6 equations, 4 figures, 14 tables)

This paper contains 17 sections, 6 equations, 4 figures, 14 tables.

Figures (4)

  • Figure 1: Two frames from a tracked video sequence. Each bounding box is annotated with a unique track IDs; the bottom-left overlay shows the unique-ID count from the virtual counting zone (yellow area).
  • Figure 2: Overview of the proposed physics-constrained detect–track–count pipeline. The Phys-3D model integrates geometric and ego-motion priors to achieve stable identity tracking under camera motion.
  • Figure 3: Virtual Counting Band Illustration. The diagram shows the virtual counting zones positioned near the left and right image borders. The band is defined by start and end boundaries as proportions of image width (Start=0.05, End=0.20), creating a buffer region that tolerates brief occlusions and detection jitter. A target is counted only when it remains continuously within the band for a preset number of frames, providing robust counting under challenging conditions.
  • Figure 4: ReID Input Size Ablation Experiment Results