Lane Change Classification and Prediction with Action Recognition Networks
Kai Liang, Jun Wang, Abhir Bhalerao
TL;DR
This work targets the problem of predicting and classifying lane change maneuvers of surrounding vehicles using semantic visual information rather than relying solely on physical variables. It introduces two end-to-end action-recognition–based frameworks operating on RGB video (RGB+3DN) and RGB video with bounding-box augmentation (RGB+BB+3DN), evaluated across seven 3D CNN architectures, including I3D, SlowFast, and X3D, on the PREVENTION dataset. The study demonstrates state-of-the-art performance for RGB-only lane change classification (up to 84.79% top-1 with X3D-S) and near-perfect results with bounding-box augmentation (≈99% top-1), and provides CAM-based insights into the spatio-temporal regions driving predictions, along with a finding that smaller temporal kernels can better capture motion cues. The results highlight the practicality of action-recognition models for autonomous driving perception, showing significant gains in both classification and early prediction, and suggesting avenues for reducing annotation dependencies by integrating detection pipelines in future work.
Abstract
Anticipating lane change intentions of surrounding vehicles is crucial for efficient and safe driving decision making in an autonomous driving system. Previous works often adopt physical variables such as driving speed, acceleration and so forth for lane change classification. However, physical variables do not contain semantic information. Although 3D CNNs have been developing rapidly, the number of methods utilising action recognition models and appearance feature for lane change recognition is low, and they all require additional information to pre-process data. In this work, we propose an end-to-end framework including two action recognition methods for lane change recognition, using video data collected by cameras. Our method achieves the best lane change classification results using only the RGB video data of the PREVENTION dataset. Class activation maps demonstrate that action recognition models can efficiently extract lane change motions. A method to better extract motion clues is also proposed in this paper.
