CarSpeedNet: Learning-Based Speed Estimation from Accelerometer-Only Inertial Sensing
Barak Or
TL;DR
This work tackles velocity estimation using only a smartphone accelerometer in sensor-minimal scenarios where traditional velocity sources are unavailable. It introduces CarSpeedNet, a learning-based framework that leverages temporal context to infer speed from raw accelerations without gyroscopes or external positioning, and interprets velocity as a latent state under partial observability, with the temporal window acting as an information horizon. Empirically, CarSpeedNet achieves RMSE = 2.9 m/s and MAE = 1.3 m/s for a 1 s window, improving to RMSE = 1.8 m/s and MAE = 0.72 m/s with longer windows, illustrating the accuracy–latency trade-off under constrained sensing. The results demonstrate feasibility for low-cost, on-device velocity estimation and provide conceptual guidance for latent-state representations in constrained robotic sensing, with potential applications in traffic safety and redundant navigation modules.
Abstract
Velocity estimation is a core component of state estimation and sensor fusion pipelines in mobile robotics and autonomous ground systems, directly affecting navigation accuracy, control stability, and operational safety. In conventional systems, velocity is obtained through wheel encoders, inertial navigation units, or tightly coupled multi-sensor fusion architectures. However, these sensing configurations are not always available or reliable, particularly in low-cost, redundancy-constrained, or degraded operational scenarios where sensors may fail, drift, or become temporarily unavailable. This paper investigates the feasibility of estimating vehicle speed using only a single low-cost inertial sensor: a three-axis accelerometer embedded in a commodity smartphone. We present CarSpeedNet, a learning-based inertial estimation framework designed to infer speed directly from raw accelerometer measurements, without access to gyroscopes, wheel odometry, vehicle bus data, or external positioning during inference. From a sensor fusion perspective, this setting represents an extreme case of sensing sparsity, in which classical integration-based or filter-based approaches become unstable due to bias accumulation and partial observability. Rather than explicitly estimating physical states such as orientation or sensor bias, the proposed approach performs implicit latent-state approximation from temporal accelerometer data.
