Table of Contents
Fetching ...

An Inertial Sequence Learning Framework for Vehicle Speed Estimation via Smartphone IMU

Xuan Xiao, Xiaotong Ren, Haitao Li

TL;DR

This work tackles smartphone-based vehicle speed estimation under GNSS unreliability by introducing DVSE, a temporal-inference framework that learns directly from IMU data supervised by GNSS. It decomposes the problem into two specialized components: a noise compensation network that fits sensor disturbances via a GRU-based sequence model, and a motion transformation network (MTN) that aligns the phone and vehicle coordinate systems using a TCN-driven pose estimation. A data augmentation strategy simulates diverse phone placements, and a loss-matching technique addresses GNSS-IMU timestamp delays, enabling robust training. Experiments on a large real-world crowdsourced dataset demonstrate that DVSE achieves higher accuracy and better generalization than baselines, with efficient smartphone deployment via ONNX-Runtime, suggesting practical benefits for plug-and-play mobile navigation in GNSS-challenged environments.

Abstract

Accurately estimating vehicle velocity via smartphone is critical for mobile navigation and transportation. This paper introduces a cutting-edge framework for velocity estimation that incorporates temporal learning models, utilizing Inertial Measurement Unit (IMU) data and is supervised by Global Navigation Satellite System (GNSS) information. The framework employs a noise compensation network to fit the noise distribution between sensor measurements and actual motion, and a pose estimation network to align the coordinate systems of the phone and the vehicle. To enhance the model's generalizability, a data augmentation technique that mimics various phone placements within the car is proposed. Moreover, a new loss function is designed to mitigate timestamp mismatches between GNSS and IMU signals, effectively aligning the signals and improving the velocity estimation accuracy. Finally, we implement a highly efficient prototype and conduct extensive experiments on a real-world crowdsourcing dataset, resulting in superior accuracy and efficiency.

An Inertial Sequence Learning Framework for Vehicle Speed Estimation via Smartphone IMU

TL;DR

This work tackles smartphone-based vehicle speed estimation under GNSS unreliability by introducing DVSE, a temporal-inference framework that learns directly from IMU data supervised by GNSS. It decomposes the problem into two specialized components: a noise compensation network that fits sensor disturbances via a GRU-based sequence model, and a motion transformation network (MTN) that aligns the phone and vehicle coordinate systems using a TCN-driven pose estimation. A data augmentation strategy simulates diverse phone placements, and a loss-matching technique addresses GNSS-IMU timestamp delays, enabling robust training. Experiments on a large real-world crowdsourced dataset demonstrate that DVSE achieves higher accuracy and better generalization than baselines, with efficient smartphone deployment via ONNX-Runtime, suggesting practical benefits for plug-and-play mobile navigation in GNSS-challenged environments.

Abstract

Accurately estimating vehicle velocity via smartphone is critical for mobile navigation and transportation. This paper introduces a cutting-edge framework for velocity estimation that incorporates temporal learning models, utilizing Inertial Measurement Unit (IMU) data and is supervised by Global Navigation Satellite System (GNSS) information. The framework employs a noise compensation network to fit the noise distribution between sensor measurements and actual motion, and a pose estimation network to align the coordinate systems of the phone and the vehicle. To enhance the model's generalizability, a data augmentation technique that mimics various phone placements within the car is proposed. Moreover, a new loss function is designed to mitigate timestamp mismatches between GNSS and IMU signals, effectively aligning the signals and improving the velocity estimation accuracy. Finally, we implement a highly efficient prototype and conduct extensive experiments on a real-world crowdsourcing dataset, resulting in superior accuracy and efficiency.

Paper Structure

This paper contains 37 sections, 14 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Left: Driving with the assistance of navigation application. Right: Vehicle drving in GNSS-blocked environments.
  • Figure 2: Coordinate system transformation from the phone to the vehicle.
  • Figure 3: The architecture of our model. The model consists three parts: sequential learning for noise (Sec. \ref{['sec:noise']}), motion transformation from phone to vehicle (Sec. \ref{['sec:mtn']}), and loss calculation (Sec. \ref{['sec:loss']}). $\bigodot$ stands for dot multiplication, and $\bigoplus$ stands for addition.
  • Figure 4: Model architecture of the noise compensation block. 2x means that the module is repeated twice, and [32,64] means the number of hidden units.
  • Figure 5: The architecture of the motion transformation network. The TCN block contains 2 temporal blocks, and each tenporal block consists 3 dilated causal layers. $k$ is kernal size and $d$ is dilation factor.
  • ...and 4 more figures