Table of Contents
Fetching ...

Sentiment Matters: An Analysis of 200 Human-SAV Interactions

Lirui Guo, Michael G. Burke, Wynita M. Griggs

TL;DR

This work addresses the need for understanding how human–SAV conversational sentiment and prompting strategies influence user acceptance and service perception. It introduces an open-source dataset of 2,136 SAV exchanges and 200 post-interaction surveys collected from 50 participants interacting with four GPT-3.5‑driven SAV agents under varied prompts. The authors demonstrate two benchmarks: Case Study 1 uses a predictive modeling plus chord-diagram framework to identify item-level drivers of SAV acceptance, revealing that sentiment polarity becomes a key predictor under certain prompts; Case Study 2 compares an LLM-based sentiment analyzer with TextBlob, finding modest but superior alignment of the LLM approach with self-reported sentiment, and highlighting limitations due to the text-only signal and contextual factors. The dataset and findings offer actionable guidance for sentiment-aware, adaptive SAV interfaces and establish a foundation for future multimodal sentiment modeling in autonomous vehicle interactions. The work emphasizes practical implications for real-time sentiment monitoring and prompts design while acknowledging the need for broader participant samples and richer cues to enhance predictive accuracy.

Abstract

Shared Autonomous Vehicles (SAVs) are likely to become an important part of the transportation system, making effective human-SAV interactions an important area of research. This paper introduces a dataset of 200 human-SAV interactions to further this area of study. We present an open-source human-SAV conversational dataset, comprising both textual data (e.g., 2,136 human-SAV exchanges) and empirical data (e.g., post-interaction survey results on a range of psychological factors). The dataset's utility is demonstrated through two benchmark case studies: First, using random forest modeling and chord diagrams, we identify key predictors of SAV acceptance and perceived service quality, highlighting the critical influence of response sentiment polarity (i.e., perceived positivity). Second, we benchmark the performance of an LLM-based sentiment analysis tool against the traditional lexicon-based TextBlob method. Results indicate that even simple zero-shot LLM prompts more closely align with user-reported sentiment, though limitations remain. This study provides novel insights for designing conversational SAV interfaces and establishes a foundation for further exploration into advanced sentiment modeling, adaptive user interactions, and multimodal conversational systems.

Sentiment Matters: An Analysis of 200 Human-SAV Interactions

TL;DR

This work addresses the need for understanding how human–SAV conversational sentiment and prompting strategies influence user acceptance and service perception. It introduces an open-source dataset of 2,136 SAV exchanges and 200 post-interaction surveys collected from 50 participants interacting with four GPT-3.5‑driven SAV agents under varied prompts. The authors demonstrate two benchmarks: Case Study 1 uses a predictive modeling plus chord-diagram framework to identify item-level drivers of SAV acceptance, revealing that sentiment polarity becomes a key predictor under certain prompts; Case Study 2 compares an LLM-based sentiment analyzer with TextBlob, finding modest but superior alignment of the LLM approach with self-reported sentiment, and highlighting limitations due to the text-only signal and contextual factors. The dataset and findings offer actionable guidance for sentiment-aware, adaptive SAV interfaces and establish a foundation for future multimodal sentiment modeling in autonomous vehicle interactions. The work emphasizes practical implications for real-time sentiment monitoring and prompts design while acknowledging the need for broader participant samples and richer cues to enhance predictive accuracy.

Abstract

Shared Autonomous Vehicles (SAVs) are likely to become an important part of the transportation system, making effective human-SAV interactions an important area of research. This paper introduces a dataset of 200 human-SAV interactions to further this area of study. We present an open-source human-SAV conversational dataset, comprising both textual data (e.g., 2,136 human-SAV exchanges) and empirical data (e.g., post-interaction survey results on a range of psychological factors). The dataset's utility is demonstrated through two benchmark case studies: First, using random forest modeling and chord diagrams, we identify key predictors of SAV acceptance and perceived service quality, highlighting the critical influence of response sentiment polarity (i.e., perceived positivity). Second, we benchmark the performance of an LLM-based sentiment analysis tool against the traditional lexicon-based TextBlob method. Results indicate that even simple zero-shot LLM prompts more closely align with user-reported sentiment, though limitations remain. This study provides novel insights for designing conversational SAV interfaces and establishes a foundation for further exploration into advanced sentiment modeling, adaptive user interactions, and multimodal conversational systems.

Paper Structure

This paper contains 17 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: System architecture of the simulated SAV agent.
  • Figure 2: Example chord diagram illustrating item-level importance in predicting a target variable. Two latent factors are shown, each comprising multiple items used as predictors. The width of each arc represents the relative importance of the item, normalized such that the total importance across all predictors sums to 100%. In this example, Item A3 emerges as the most influential predictor among the six items.
  • Figure 3: Chord Diagrams Showing Relative Importance of Predicting Target Factors. The arrows (chords) connect the predictor items (start nodes) and the target variables (end nodes). The width of the chords indicates the relative importance of each predictor item. The sum of the relative importance of each target variable (i.e., the overall questions) is 100%.
  • Figure 4: Raincloud plots comparing survey scores to the highest-correlating sentiment features (among min, max, mean, median, mode) from LLM and TextBlob. Each combines density, boxplot, and jittered points, with Spearman p-values annotated. Kernel density estimates were computed using Scott’s rule for bandwidth selection.