Table of Contents
Fetching ...

Designing deep neural networks for driver intention recognition

Koen Vellenga, H. Joe Steinhauer, Alexander Karlsson, Göran Falkman, Asli Rhodin, Ashok Koppisetty

TL;DR

Neural architecture search is applied to investigate the effects of the deep neural network architecture on a real-world safety critical application with limited computational capabilities and indicates that multiple architectures yield similar performance, regardless of the deep neural network layer type or fusion strategy.

Abstract

Driver intention recognition studies increasingly rely on deep neural networks. Deep neural networks have achieved top performance for many different tasks, but it is not a common practice to explicitly analyse the complexity and performance of the network's architecture. Therefore, this paper applies neural architecture search to investigate the effects of the deep neural network architecture on a real-world safety critical application with limited computational capabilities. We explore a pre-defined search space for three deep neural network layer types that are capable to handle sequential data (a long-short term memory, temporal convolution, and a time-series transformer layer), and the influence of different data fusion strategies on the driver intention recognition performance. A set of eight search strategies are evaluated for two driver intention recognition datasets. For the two datasets, we observed that there is no search strategy clearly sampling better deep neural network architectures. However, performing an architecture search does improve the model performance compared to the original manually designed networks. Furthermore, we observe no relation between increased model complexity and higher driver intention recognition performance. The result indicate that multiple architectures yield similar performance, regardless of the deep neural network layer type or fusion strategy.

Designing deep neural networks for driver intention recognition

TL;DR

Neural architecture search is applied to investigate the effects of the deep neural network architecture on a real-world safety critical application with limited computational capabilities and indicates that multiple architectures yield similar performance, regardless of the deep neural network layer type or fusion strategy.

Abstract

Driver intention recognition studies increasingly rely on deep neural networks. Deep neural networks have achieved top performance for many different tasks, but it is not a common practice to explicitly analyse the complexity and performance of the network's architecture. Therefore, this paper applies neural architecture search to investigate the effects of the deep neural network architecture on a real-world safety critical application with limited computational capabilities. We explore a pre-defined search space for three deep neural network layer types that are capable to handle sequential data (a long-short term memory, temporal convolution, and a time-series transformer layer), and the influence of different data fusion strategies on the driver intention recognition performance. A set of eight search strategies are evaluated for two driver intention recognition datasets. For the two datasets, we observed that there is no search strategy clearly sampling better deep neural network architectures. However, performing an architecture search does improve the model performance compared to the original manually designed networks. Furthermore, we observe no relation between increased model complexity and higher driver intention recognition performance. The result indicate that multiple architectures yield similar performance, regardless of the deep neural network layer type or fusion strategy.
Paper Structure (21 sections, 8 equations, 6 figures, 9 tables)

This paper contains 21 sections, 8 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Schematic overview of deep learning fusion strategies, the blue box indicates where the fusion operation is performed. (A) The early fusion strategy expects a concatenated input vector of all modalities. (B) The intermediate fusion strategy first learns a representation for one or multiple modalities to learn a joint representation later in the network. (C) Late fusion first predicts per modality and combines all uni-modal predictions into a final decision.
  • Figure 2: Overview of the neural architecture search framework. First, a search space has to be defined, after which a search strategy will sample DNN architectures. The estimated performance of the architecture is stored and the results are used by the search strategy as input to propose the next architecture. Figure adapted from elsken2019neural.
  • Figure 3: Generic framework describing evolutionary algorithms. The population is a set of architectures sampled from the search space in a NAS context, and every iteration is called a generation (figure adapted from darwish2020survey).
  • Figure 4: Overview of the driver intention recognition scenarios. (A). Ego-vehicle (black) driver's lane change intention Jain2016. (B) Ego-vehicle driver's turn maneuver intention at an intersection Jain2016. (C) Roundabout driving maneuver intention zyner2019acfr.
  • Figure 5: Visualization of the five fold cross-validation performance and model complexity for the top 3 models for each type sampled by the search strategies. Best viewed in color.
  • ...and 1 more figures