Table of Contents
Fetching ...

Recovering Origin Destination Flows from Bus CCTV: Early Results from Nairobi and Kigali

Nthenya Kyatha, Jay Taneja

TL;DR

Sub-Saharan Africa buses lack reliable passenger-flow data; this paper presents a CCTV-based baseline OD inference pipeline that reuses existing onboard cameras and telematics. The system combines per-camera detection and tracking (YOLOv12/BotSORT/OSNet), OCR timestamp extraction, ROI-based door counting, and cross-camera association to build OD matrices. Findings show strong performance under light conditions but significant drop under overcrowding and modality shifts, revealing deployment-specific failure modes. Door-state aware counting and simple Re-ID enhancements substantially improve exit accuracy and overall OD quality, underscoring the potential and the remaining challenges for scalable SSA transit analytics.

Abstract

Public transport in sub-Saharan Africa (SSA) often operates in overcrowded conditions where existing automated systems fail to capture reliable passenger flow data. Leveraging onboard CCTV already deployed for security, we present a baseline pipeline that combines YOLOv12 detection, BotSORT tracking, OSNet embeddings, OCR-based timestamping, and telematics-based stop classification to recover bus origin--destination (OD) flows. On annotated CCTV segments from Nairobi and Kigali buses, the system attains high counting accuracy under low-density, well-lit conditions (recall $\approx$95\%, precision $\approx$91\%, F1 $\approx$93\%). It produces OD matrices that closely match manual tallies. Under realistic stressors such as overcrowding, color-to-monochrome shifts, posture variation, and non-standard door use, performance degrades sharply (e.g., $\sim$40\% undercount in peak-hour boarding and a $\sim$17 percentage-point drop in recall for monochrome segments), revealing deployment-specific failure modes and motivating more robust, deployment-focused Re-ID methods for SSA transit.

Recovering Origin Destination Flows from Bus CCTV: Early Results from Nairobi and Kigali

TL;DR

Sub-Saharan Africa buses lack reliable passenger-flow data; this paper presents a CCTV-based baseline OD inference pipeline that reuses existing onboard cameras and telematics. The system combines per-camera detection and tracking (YOLOv12/BotSORT/OSNet), OCR timestamp extraction, ROI-based door counting, and cross-camera association to build OD matrices. Findings show strong performance under light conditions but significant drop under overcrowding and modality shifts, revealing deployment-specific failure modes. Door-state aware counting and simple Re-ID enhancements substantially improve exit accuracy and overall OD quality, underscoring the potential and the remaining challenges for scalable SSA transit analytics.

Abstract

Public transport in sub-Saharan Africa (SSA) often operates in overcrowded conditions where existing automated systems fail to capture reliable passenger flow data. Leveraging onboard CCTV already deployed for security, we present a baseline pipeline that combines YOLOv12 detection, BotSORT tracking, OSNet embeddings, OCR-based timestamping, and telematics-based stop classification to recover bus origin--destination (OD) flows. On annotated CCTV segments from Nairobi and Kigali buses, the system attains high counting accuracy under low-density, well-lit conditions (recall 95\%, precision 91\%, F1 93\%). It produces OD matrices that closely match manual tallies. Under realistic stressors such as overcrowding, color-to-monochrome shifts, posture variation, and non-standard door use, performance degrades sharply (e.g., 40\% undercount in peak-hour boarding and a 17 percentage-point drop in recall for monochrome segments), revealing deployment-specific failure modes and motivating more robust, deployment-focused Re-ID methods for SSA transit.

Paper Structure

This paper contains 23 sections, 2 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Method comparison across 11 annotated segments. We report entry, exit, and total accuracy (top row) and mean absolute error (MAE, in passengers; bottom row) for the baseline pipeline, the door-state aware variant (door_state), the hybrid head-only detector configuration (hybrid_det), and an all_together (no id repair) variant that combines improved components without identity repair. Entry performance is high for all methods, while exit accuracy and exit MAE improve substantially when incorporating door-state gating and related telematics cues.
  • Figure 2: Baseline two-stream pipeline for OD inference. Each camera (Cam-A at the front, Cam-B at the exit) runs YOLOv12 detection with BotSORT+OSNet tracking to produce local tracklets and ROI-based IN/OUT events. OCR-extracted timestamps and FPS align events to a per-second timeline, while telematics (wheel speed and GPS) identify official and illegal stops. These signals are fused to construct stop-level OD matrices; later sections add hybrid head-only detection and door-state aware counting on top of this baseline.
  • Figure 3: Enhanced Two–stream pipeline with new and enhanced modules highlighted in green. Front camera (Cam A): YOLOv12 Full body with head detection as a fallback in crowded scenes for detection $\rightarrow$ BoT–SORT with re-identification (local tracking) $\rightarrow$EMA identity repair$\rightarrow$door and queue aware ROI counting. Exit camera (Cam B): YOLOv12 Full body with head detection $\rightarrow$ BoT–SORT with re–identification $\rightarrow$door plus trajectory plus crowd aware identity repair (gated by the door state open or closed signal) $\rightarrow$door and queue aware ROI counting. Shared modules: OCR timestamps per frame and frame rate aggregation for per second tallies, cross camera re–identification (A$\leftrightarrow$B association), telematics signals for stop identification and stop classification (known vs illegal), and final origin–destination matrix construction. This figure summarizes the full system but is not central to the experimental comparisons in the main text.
  • Figure 4: Tracking summary plots from two representative runs. Each panel reports identity consistency (IDF1), overall tracking accuracy (HOTA or MOTA as available), identity switches, detection F1, fragmentation, association consistency, and ground–truth coverage.
  • Figure 5: Per-clip entry (top) and exit (bottom) counts across 11 annotated segments. Red bars show ground-truth counts; blue bars show the baseline, door_state, and hybrid_det variants. Blank panels (no visible bars) correspond to clips in which no events occurred during the 3--8 minute window, so all methods correctly report zero entries or exits.