Table of Contents
Fetching ...

MonoPIC -- A Monocular Low-Latency Pedestrian Intention Classification Framework for IoT Edges Using ID3 Modelled Decision Trees

Sriram Radhakrishna, Adithya Balasubramanyam

TL;DR

MonoPIC addresses the need for real-time pedestrian intention classification on IoT edge devices without depth perception. It integrates quaternion-based orientation and velocity features derived from MediaPipe Pose with an ID3 decision-tree classifier, achieving low latency and edge-friendly computation. On a monocular setup, it reports an average accuracy of $83.56\%$ with a latency of $48\,\mathrm{ms}$, outperforming depth-based monocular DL baselines while using far fewer calculations. This approach enables fast, power-efficient pedestrian avoidance on constrained devices, contributing to safer, scalable ITS deployments.

Abstract

Road accidents involving autonomous vehicles commonly occur in situations where a (pedestrian) obstacle presents itself in the path of the moving vehicle at very sudden time intervals, leaving the robot even lesser time to react to the change in scene. In order to tackle this issue, we propose a novel algorithmic implementation that classifies the intent of a single arbitrarily chosen pedestrian in a two dimensional frame into logic states in a procedural manner using quaternions generated from a MediaPipe pose estimation model. This bypasses the need to employ any relatively high latency deep-learning algorithms primarily due to the lack of necessity for depth perception as well as an implicit cap on the computational resources that most IoT edge devices present. The model was able to achieve an average testing accuracy of 83.56% with a reliable variance of 0.0042 while operating with an average latency of 48 milliseconds, demonstrating multiple notable advantages over the current standard of using spatio-temporal convolutional networks for these perceptive tasks.

MonoPIC -- A Monocular Low-Latency Pedestrian Intention Classification Framework for IoT Edges Using ID3 Modelled Decision Trees

TL;DR

MonoPIC addresses the need for real-time pedestrian intention classification on IoT edge devices without depth perception. It integrates quaternion-based orientation and velocity features derived from MediaPipe Pose with an ID3 decision-tree classifier, achieving low latency and edge-friendly computation. On a monocular setup, it reports an average accuracy of with a latency of , outperforming depth-based monocular DL baselines while using far fewer calculations. This approach enables fast, power-efficient pedestrian avoidance on constrained devices, contributing to safer, scalable ITS deployments.

Abstract

Road accidents involving autonomous vehicles commonly occur in situations where a (pedestrian) obstacle presents itself in the path of the moving vehicle at very sudden time intervals, leaving the robot even lesser time to react to the change in scene. In order to tackle this issue, we propose a novel algorithmic implementation that classifies the intent of a single arbitrarily chosen pedestrian in a two dimensional frame into logic states in a procedural manner using quaternions generated from a MediaPipe pose estimation model. This bypasses the need to employ any relatively high latency deep-learning algorithms primarily due to the lack of necessity for depth perception as well as an implicit cap on the computational resources that most IoT edge devices present. The model was able to achieve an average testing accuracy of 83.56% with a reliable variance of 0.0042 while operating with an average latency of 48 milliseconds, demonstrating multiple notable advantages over the current standard of using spatio-temporal convolutional networks for these perceptive tasks.
Paper Structure (20 sections, 13 equations, 10 figures, 3 tables, 1 algorithm)

This paper contains 20 sections, 13 equations, 10 figures, 3 tables, 1 algorithm.

Figures (10)

  • Figure 1: A representation of a deep learning model, commonly used to evaluate pedestrian intent. Sub-figure a involves the traditional method with manual feature engineering and classification turner1999conceptual while sub- figure b illustrates a generic deep learning model given the task of both feature extraction and classification lecun2015deep.
  • Figure 2: A simplified representation of the angle of orientation with the horizon being captured as referenced from radhakrishna2023economical.
  • Figure 3: A representation of the manner in which the skeletal pose landmarks are visualized. Note the origins of the vector spacekim2023human
  • Figure 4: A plot showing the variation in readings, accurate to the degree with selective approximation up or down.
  • Figure 5: A representation of the directions of movement that can be observed on screen while evaluating a 2D frame, iterating from case 1 through 8
  • ...and 5 more figures