Table of Contents
Fetching ...

Beyond Empirical Windowing: An Attention-Based Approach for Trust Prediction in Autonomous Vehicles

Minxue Niu, Zhaobo Zheng, Kumar Akash, Teruhisa Misu

TL;DR

The paper tackles trust prediction in autonomous vehicles from long multimodal time-series by addressing the fixed-window limitation inherent to traditional windowing. It introduces SWAN, an attention-based network that combines limited-range self-attention with a windowing mechanism and window saliency weighting to localize and emphasize critical trust-change intervals without extensive window-size search. On a new public AV trust dataset, SWAN outperforms empirical windowing, CNN-LSTM, and Transformer baselines and shows robustness across a wide range of window sizes, while offering interpretable attention patterns and low computational cost. This approach holds practical value for human-machine interaction research and AV trust modeling where labeled data are scarce and signals span long durations.

Abstract

Humans' internal states play a key role in human-machine interaction, leading to the rise of human state estimation as a prominent field. Compared to swift state changes such as surprise and irritation, modeling gradual states like trust and satisfaction are further challenged by label sparsity: long time-series signals are usually associated with a single label, making it difficult to identify the critical span of state shifts. Windowing has been one widely-used technique to enable localized analysis of long time-series data. However, the performance of downstream models can be sensitive to the window size, and determining the optimal window size demands domain expertise and extensive search. To address this challenge, we propose a Selective Windowing Attention Network (SWAN), which employs window prompts and masked attention transformation to enable the selection of attended intervals with flexible lengths. We evaluate SWAN on the task of trust prediction on a new multimodal driving simulation dataset. Experiments show that SWAN significantly outperforms an existing empirical window selection baseline and neural network baselines including CNN-LSTM and Transformer. Furthermore, it shows robustness across a wide span of windowing ranges, compared to the traditional windowing approach.

Beyond Empirical Windowing: An Attention-Based Approach for Trust Prediction in Autonomous Vehicles

TL;DR

The paper tackles trust prediction in autonomous vehicles from long multimodal time-series by addressing the fixed-window limitation inherent to traditional windowing. It introduces SWAN, an attention-based network that combines limited-range self-attention with a windowing mechanism and window saliency weighting to localize and emphasize critical trust-change intervals without extensive window-size search. On a new public AV trust dataset, SWAN outperforms empirical windowing, CNN-LSTM, and Transformer baselines and shows robustness across a wide range of window sizes, while offering interpretable attention patterns and low computational cost. This approach holds practical value for human-machine interaction research and AV trust modeling where labeled data are scarce and signals span long durations.

Abstract

Humans' internal states play a key role in human-machine interaction, leading to the rise of human state estimation as a prominent field. Compared to swift state changes such as surprise and irritation, modeling gradual states like trust and satisfaction are further challenged by label sparsity: long time-series signals are usually associated with a single label, making it difficult to identify the critical span of state shifts. Windowing has been one widely-used technique to enable localized analysis of long time-series data. However, the performance of downstream models can be sensitive to the window size, and determining the optimal window size demands domain expertise and extensive search. To address this challenge, we propose a Selective Windowing Attention Network (SWAN), which employs window prompts and masked attention transformation to enable the selection of attended intervals with flexible lengths. We evaluate SWAN on the task of trust prediction on a new multimodal driving simulation dataset. Experiments show that SWAN significantly outperforms an existing empirical window selection baseline and neural network baselines including CNN-LSTM and Transformer. Furthermore, it shows robustness across a wide span of windowing ranges, compared to the traditional windowing approach.
Paper Structure (17 sections, 8 equations, 3 figures, 1 table)

This paper contains 17 sections, 8 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: SWAN structure.
  • Figure 2: Four frames taken within 10 seconds from a sidewalk mobility video, and the model's attention weights visualizations on this interval.
  • Figure 3: Performance across window ranges/sizes.