Table of Contents
Fetching ...

Leveraging Large Language Models with Chain-of-Thought and Prompt Engineering for Traffic Crash Severity Analysis and Inference

Hao Zhen, Yucheng Shi, Yongcan Huang, Jidong J. Yang, Ninghao Liu

TL;DR

This work investigates whether large language models can infer crash severity from narratives generated from structured crash data, using Chain-of-Thought and domain-informed prompt engineering to guide reasoning. By evaluating GPT-3.5-turbo and LLaMA-3 variants (8B and 70B) on a multiclass severity task, the study finds that LLaMA-70B generally achieves the strongest zero-shot performance, while prompting strategies (PE and CoT) improve accuracy and transparency across models. Prompt engineering helps adapt outputs to alignment constraints, and CoT provides interpretable reasoning about factors like environment, driver behavior, and vehicle characteristics. The results highlight the potential of LLM-driven crash analysis and suggest that larger models, combined with structured prompting, can support more reliable severity inference in traffic safety contexts.

Abstract

Harnessing the power of Large Language Models (LLMs), this study explores the use of three state-of-the-art LLMs, specifically GPT-3.5-turbo, LLaMA3-8B, and LLaMA3-70B, for crash severity inference, framing it as a classification task. We generate textual narratives from original traffic crash tabular data using a pre-built template infused with domain knowledge. Additionally, we incorporated Chain-of-Thought (CoT) reasoning to guide the LLMs in analyzing the crash causes and then inferring the severity. This study also examine the impact of prompt engineering specifically designed for crash severity inference. The LLMs were tasked with crash severity inference to: (1) evaluate the models' capabilities in crash severity analysis, (2) assess the effectiveness of CoT and domain-informed prompt engineering, and (3) examine the reasoning abilities with the CoT framework. Our results showed that LLaMA3-70B consistently outperformed the other models, particularly in zero-shot settings. The CoT and Prompt Engineering techniques significantly enhanced performance, improving logical reasoning and addressing alignment issues. Notably, the CoT offers valuable insights into LLMs' reasoning processes, unleashing their capacity to consider diverse factors such as environmental conditions, driver behavior, and vehicle characteristics in severity analysis and inference.

Leveraging Large Language Models with Chain-of-Thought and Prompt Engineering for Traffic Crash Severity Analysis and Inference

TL;DR

This work investigates whether large language models can infer crash severity from narratives generated from structured crash data, using Chain-of-Thought and domain-informed prompt engineering to guide reasoning. By evaluating GPT-3.5-turbo and LLaMA-3 variants (8B and 70B) on a multiclass severity task, the study finds that LLaMA-70B generally achieves the strongest zero-shot performance, while prompting strategies (PE and CoT) improve accuracy and transparency across models. Prompt engineering helps adapt outputs to alignment constraints, and CoT provides interpretable reasoning about factors like environment, driver behavior, and vehicle characteristics. The results highlight the potential of LLM-driven crash analysis and suggest that larger models, combined with structured prompting, can support more reliable severity inference in traffic safety contexts.

Abstract

Harnessing the power of Large Language Models (LLMs), this study explores the use of three state-of-the-art LLMs, specifically GPT-3.5-turbo, LLaMA3-8B, and LLaMA3-70B, for crash severity inference, framing it as a classification task. We generate textual narratives from original traffic crash tabular data using a pre-built template infused with domain knowledge. Additionally, we incorporated Chain-of-Thought (CoT) reasoning to guide the LLMs in analyzing the crash causes and then inferring the severity. This study also examine the impact of prompt engineering specifically designed for crash severity inference. The LLMs were tasked with crash severity inference to: (1) evaluate the models' capabilities in crash severity analysis, (2) assess the effectiveness of CoT and domain-informed prompt engineering, and (3) examine the reasoning abilities with the CoT framework. Our results showed that LLaMA3-70B consistently outperformed the other models, particularly in zero-shot settings. The CoT and Prompt Engineering techniques significantly enhanced performance, improving logical reasoning and addressing alignment issues. Notably, the CoT offers valuable insights into LLMs' reasoning processes, unleashing their capacity to consider diverse factors such as environmental conditions, driver behavior, and vehicle characteristics in severity analysis and inference.
Paper Structure (25 sections, 6 equations, 13 figures, 2 tables)

This paper contains 25 sections, 6 equations, 13 figures, 2 tables.

Figures (13)

  • Figure 1: Textual narrative generation
  • Figure 2: Zero-shot (ZS)
  • Figure 3: Zero-shot with CoT (ZS_CoT)
  • Figure 4: Zero-shot with prompt engineering (ZS_PE)
  • Figure 5: Zero-shot with prompt engineering & CoT (ZS_PE_CoT)
  • ...and 8 more figures