An Inertial Sequence Learning Framework for Vehicle Speed Estimation via Smartphone IMU
Xuan Xiao, Xiaotong Ren, Haitao Li
TL;DR
This work tackles smartphone-based vehicle speed estimation under GNSS unreliability by introducing DVSE, a temporal-inference framework that learns directly from IMU data supervised by GNSS. It decomposes the problem into two specialized components: a noise compensation network that fits sensor disturbances via a GRU-based sequence model, and a motion transformation network (MTN) that aligns the phone and vehicle coordinate systems using a TCN-driven pose estimation. A data augmentation strategy simulates diverse phone placements, and a loss-matching technique addresses GNSS-IMU timestamp delays, enabling robust training. Experiments on a large real-world crowdsourced dataset demonstrate that DVSE achieves higher accuracy and better generalization than baselines, with efficient smartphone deployment via ONNX-Runtime, suggesting practical benefits for plug-and-play mobile navigation in GNSS-challenged environments.
Abstract
Accurately estimating vehicle velocity via smartphone is critical for mobile navigation and transportation. This paper introduces a cutting-edge framework for velocity estimation that incorporates temporal learning models, utilizing Inertial Measurement Unit (IMU) data and is supervised by Global Navigation Satellite System (GNSS) information. The framework employs a noise compensation network to fit the noise distribution between sensor measurements and actual motion, and a pose estimation network to align the coordinate systems of the phone and the vehicle. To enhance the model's generalizability, a data augmentation technique that mimics various phone placements within the car is proposed. Moreover, a new loss function is designed to mitigate timestamp mismatches between GNSS and IMU signals, effectively aligning the signals and improving the velocity estimation accuracy. Finally, we implement a highly efficient prototype and conduct extensive experiments on a real-world crowdsourcing dataset, resulting in superior accuracy and efficiency.
