Estimating Vehicle Speed on Roadways Using RNNs and Transformers: A Video-based Approach
Sai Krishna Reddy Mareddy, Dhanush Upplapati, Dhanush Kumar Antharam
TL;DR
The paper addresses non-intrusive vehicle speed estimation from video surveillance by comparing RNN variants (RNN, LSTM, GRU) and Transformer architectures using bounding-box-based motion features. It demonstrates that LSTM and GRU generally outperform basic RNNs, while Transformers offer robustness with longer sequences, achieving strong accuracy and low RMSE on the VS13 dataset and competitive performance on I24-3D. Key findings include VS13: LSTM accuracy 94.25% with RMSE 3.96; I24-3D: LSTM accuracy 78.62% with RMSE 10.99, while Transformer performance is slightly lower on I24-3D. The work suggests a scalable, camera-infrastructure-based framework for real-time traffic monitoring and road safety, with potential impact on traffic management systems.
Abstract
This project explores the application of advanced machine learning models, specifically Long Short-Term Memory (LSTM), Gated Recurrent Units (GRU), and Transformers, to the task of vehicle speed estimation using video data. Traditional methods of speed estimation, such as radar and manual systems, are often constrained by high costs, limited coverage, and potential disruptions. In contrast, leveraging existing surveillance infrastructure and cutting-edge neural network architectures presents a non-intrusive, scalable solution. Our approach utilizes LSTM and GRU to effectively manage long-term dependencies within the temporal sequence of video frames, while Transformers are employed to harness their self-attention mechanisms, enabling the processing of entire sequences in parallel and focusing on the most informative segments of the data. This study demonstrates that both LSTM and GRU outperform basic Recurrent Neural Networks (RNNs) due to their advanced gating mechanisms. Furthermore, increasing the sequence length of input data consistently improves model accuracy, highlighting the importance of contextual information in dynamic environments. Transformers, in particular, show exceptional adaptability and robustness across varied sequence lengths and complexities, making them highly suitable for real-time applications in diverse traffic conditions. The findings suggest that integrating these sophisticated neural network models can significantly enhance the accuracy and reliability of automated speed detection systems, thus promising to revolutionize traffic management and road safety.
