Table of Contents
Fetching ...

Mean Height Aided Post-Processing for Pedestrian Detection

Jing Yuan, Tania Stathaki, Guangyu Ren

TL;DR

The paper addresses pedestrian detection under perspective distortion by introducing a perspective-aware post-processing framework called Mean Height Aided Suppression (MHAS). It couples an Existence Score Generator (ESG) with a Mean Height Generator (MHG) to supply level-wise existence priors and mean-height priors, and uses a suppression rule that jointly enforces height consistency and pedestrian likelihood across image levels. Through manual and perspective-based mean-height estimation, MHAS demonstrates consistent improvements across multiple detectors and datasets (Caltech and Citypersons), achieving state-of-the-art results in some configurations. The approach is plug-and-play, data-efficient, and highlights the value of leveraging scene-specific priors for task-focused object detection in real-world driving scenarios.

Abstract

The design of pedestrian detectors seldom considers the unique characteristics of this task and usually follows the common strategies for general object detection. To explore the potential of these characteristics, we take the perspective effect in pedestrian datasets as an example and propose the mean height aided suppression for post-processing. This method rejects predictions that fall at levels with a low possibility of containing any pedestrians or that have an abnormal height compared to the average. To achieve this, the existence score and mean height generators are proposed. Comprehensive experiments on various datasets and detectors are performed; the choice of hyper-parameters is discussed in depth. The proposed method is easy to implement and is plug-and-play. Results show that the proposed methods significantly improve detection accuracy when applied to different existing pedestrian detectors and datasets. The combination of mean height aided suppression with particular detectors outperforms state-of-the-art pedestrian detectors on Caltech and Citypersons datasets.

Mean Height Aided Post-Processing for Pedestrian Detection

TL;DR

The paper addresses pedestrian detection under perspective distortion by introducing a perspective-aware post-processing framework called Mean Height Aided Suppression (MHAS). It couples an Existence Score Generator (ESG) with a Mean Height Generator (MHG) to supply level-wise existence priors and mean-height priors, and uses a suppression rule that jointly enforces height consistency and pedestrian likelihood across image levels. Through manual and perspective-based mean-height estimation, MHAS demonstrates consistent improvements across multiple detectors and datasets (Caltech and Citypersons), achieving state-of-the-art results in some configurations. The approach is plug-and-play, data-efficient, and highlights the value of leveraging scene-specific priors for task-focused object detection in real-world driving scenarios.

Abstract

The design of pedestrian detectors seldom considers the unique characteristics of this task and usually follows the common strategies for general object detection. To explore the potential of these characteristics, we take the perspective effect in pedestrian datasets as an example and propose the mean height aided suppression for post-processing. This method rejects predictions that fall at levels with a low possibility of containing any pedestrians or that have an abnormal height compared to the average. To achieve this, the existence score and mean height generators are proposed. Comprehensive experiments on various datasets and detectors are performed; the choice of hyper-parameters is discussed in depth. The proposed method is easy to implement and is plug-and-play. Results show that the proposed methods significantly improve detection accuracy when applied to different existing pedestrian detectors and datasets. The combination of mean height aided suppression with particular detectors outperforms state-of-the-art pedestrian detectors on Caltech and Citypersons datasets.
Paper Structure (25 sections, 10 equations, 9 figures, 15 tables, 1 algorithm)

This paper contains 25 sections, 10 equations, 9 figures, 15 tables, 1 algorithm.

Figures (9)

  • Figure 1: Two examples of unique characteristics of a person and pedestrian dataset annotations. (a) Key points of a person used in zhang2020kgsnet. (b) Visible part (green dashed line) and complete body (yellow line) annotations used in chi2020pedhuntermgan2019MGAN+.
  • Figure 2: Our motivation relies on the observation that pedestrian datasets possess the unique characteristic of perspective effect compared to general object datasets. (a) Two sample images containing persons from COCO dataset coco. A weak perspective effect is observed. The inclusion of the person category demonstrates the feasibility of applying general detectors to pedestrian detection tasks. (b) One typical sample image from Citypersons dataset zhang2017citypersons. A severe perspective effect is observed with parallel lane lines converging at one point.
  • Figure 3: Our proposed method can be better understood by examining an image from the Caltech pedestrian dataset dollar2011pedestrian that has a severe perspective effect. Parallel lane lines are marked in red and extended to converge at the vanishing point marked with a black circle. The higher dashed line is the horizon line. The two principles that underlie this method can be illustrated through two sample bounding boxes. Yellow box A stands in the impossible area over the horizon line. Objects in this area mean they are standing above the ground plane, which is impossible for pedestrians in this image. Yellow box B stands at the same level (lower dashed line) as two ground truths (green boxes) but is too large for a normal pedestrian.
  • Figure 4: The black box shows the workflow of pedestrian detection with the proposed Mean Height Aided Suppression (MHAS) method. The greed box shows the architecture of the embedded Existence Score Generator (ESG) block. In the workflow, solid lines depict the steps in conventional pedestrian detection, while dashed lines illustrate the proposed components. The term 'Detector' refers to any pedestrian detection algorithm employed. 'MHG' stands for Mean Height Generator. Details of MHG and MHAS are introduced in Section \ref{['sec: mhg']} and \ref{['sec: mhas']}.
  • Figure 5: The sketch of the ground truth existences scores and notations utilized in the training of the ESG module and the MHAS algorithm, respectively. The left part shows an annotated image separated into non-overlapping levels. Each level has a width of $b_h$ pixels. The red point marks the coordinate origin of the images referenced in this paper. The bounding box has a width of $w_i$ and a height of $h_i$ ($i=1,2,3$). Its upper corner positions at $(x, y)$. The right column is the ground truth existence score vector corresponding to each level. The level that contains the bottom edge of the bounding box will be assigned as 1, otherwise, the value of 0 is set.
  • ...and 4 more figures