Table of Contents
Fetching ...

BlinkBud: Detecting Hazards from Behind via Sampled Monocular 3D Detection on a Single Earbud

Yunzhe Li, Jiajun Yan, Yuzhou Wei, Kechen Liu, Yize Zhao, Chong Zhang, Hongzi Zhu, Li Lu, Shan Chang, Minyi Guo

TL;DR

BlinkBud addresses rear-road hazards for pedestrians and cyclists by combining a camera-equipped earbud with a paired phone to perform sampled monocular 3D detection. It introduces pitch-invariant depth estimation, yaw-aware EKF tracking, and a reinforcement-learning-based optimal blink sampling strategy to minimize power while maintaining high tracking accuracy. The system demonstrates low power consumption (earbud ~29.8 mW, phone ~702.6 mW) and strong hazard-detection performance (FPR ~4.9%, FNR ~1.5%) in real-world tests across modes, road types, vehicle types, and lighting. Its privacy-preserving on-device processing, fast latency (~72 ms), and extensible design make it a practical approach for enhancing situational awareness in urban mobility scenarios.

Abstract

Failing to be aware of speeding vehicles approaching from behind poses a huge threat to the road safety of pedestrians and cyclists. In this paper, we propose BlinkBud, which utilizes a single earbud and a paired phone to online detect hazardous objects approaching from behind of a user. The core idea is to accurately track visually identified objects utilizing a small number of sampled camera images taken from the earbud. To minimize the power consumption of the earbud and the phone while guaranteeing the best tracking accuracy, a novel 3D object tracking algorithm is devised, integrating both a Kalman filter based trajectory estimation scheme and an optimal image sampling strategy based on reinforcement learning. Moreover, the impact of constant user head movements on the tracking accuracy is significantly eliminated by leveraging the estimated pitch and yaw angles to correct the object depth estimation and align the camera coordinate system to the user's body coordinate system, respectively. We implement a prototype BlinkBud system and conduct extensive real-world experiments. Results show that BlinkBud is lightweight with ultra-low mean power consumptions of 29.8 mW and 702.6 mW on the earbud and smartphone, respectively, and can accurately detect hazards with a low average false positive ratio (FPR) and false negative ratio (FNR) of 4.90% and 1.47%, respectively.

BlinkBud: Detecting Hazards from Behind via Sampled Monocular 3D Detection on a Single Earbud

TL;DR

BlinkBud addresses rear-road hazards for pedestrians and cyclists by combining a camera-equipped earbud with a paired phone to perform sampled monocular 3D detection. It introduces pitch-invariant depth estimation, yaw-aware EKF tracking, and a reinforcement-learning-based optimal blink sampling strategy to minimize power while maintaining high tracking accuracy. The system demonstrates low power consumption (earbud ~29.8 mW, phone ~702.6 mW) and strong hazard-detection performance (FPR ~4.9%, FNR ~1.5%) in real-world tests across modes, road types, vehicle types, and lighting. Its privacy-preserving on-device processing, fast latency (~72 ms), and extensible design make it a practical approach for enhancing situational awareness in urban mobility scenarios.

Abstract

Failing to be aware of speeding vehicles approaching from behind poses a huge threat to the road safety of pedestrians and cyclists. In this paper, we propose BlinkBud, which utilizes a single earbud and a paired phone to online detect hazardous objects approaching from behind of a user. The core idea is to accurately track visually identified objects utilizing a small number of sampled camera images taken from the earbud. To minimize the power consumption of the earbud and the phone while guaranteeing the best tracking accuracy, a novel 3D object tracking algorithm is devised, integrating both a Kalman filter based trajectory estimation scheme and an optimal image sampling strategy based on reinforcement learning. Moreover, the impact of constant user head movements on the tracking accuracy is significantly eliminated by leveraging the estimated pitch and yaw angles to correct the object depth estimation and align the camera coordinate system to the user's body coordinate system, respectively. We implement a prototype BlinkBud system and conduct extensive real-world experiments. Results show that BlinkBud is lightweight with ultra-low mean power consumptions of 29.8 mW and 702.6 mW on the earbud and smartphone, respectively, and can accurately detect hazards with a low average false positive ratio (FPR) and false negative ratio (FNR) of 4.90% and 1.47%, respectively.

Paper Structure

This paper contains 60 sections, 1 theorem, 21 equations, 15 figures, 2 tables.

Key Result

theorem 1

The adaptive frame sampling strategy converges to the optimal policy.

Figures (15)

  • Figure 1: The illustration of the camera coordinate system on the earbud and the user coordinate system.
  • Figure 2: The architecture of BlinkBud, where the earbud and the phone cooperate to track hazardous moving objects from the behind of a user with the minimal number of instantaneous visual perceptions.
  • Figure 3: CDF of distances when cars and cycles are first detected using images of different resolutions.
  • Figure 4: Histogram of tracking confidence with different distances, where the tracking confidence decreases as distance increases.
  • Figure 5: The Prototype implementation of BlinkBud on an earbud (right) and an example of field study (left), where a volunteer wearing the earbud is walking along a road with motor traffic.
  • ...and 10 more figures

Theorems & Definitions (1)

  • theorem 1