Table of Contents
Fetching ...

Domain-Adapted Pre-trained Language Models for Implicit Information Extraction in Crash Narratives

Xixi Wang, Jordanka Kovaceva, Miguel Costa, Shuai Wang, Francisco Camara Pereira, Robert Thomson

TL;DR

The paper tackles extracting structured information from unstructured crash narratives, focusing on MANCOLL (7 classes) and per-vehicle CRASHTYPE (98 classes). It introduces a domain-adapted approach using LoRA-based fine-tuning on open-source PLMs and end-to-end BERT fine-tuning, framing CRASHTYPE with a 13-subtask decomposition guided by CRASHCONF. On the CISS dataset, compact PLMs achieve state-of-the-art performance, outperforming GPT-4o in many settings while requiring far fewer resources, and they can even correct mislabeled annotations in the data. The study demonstrates robust data annotation improvements, privacy-preserving deployment, and strong consistency across models, highlighting the practical impact for scalable, safe crash-narrative analysis and policy support.

Abstract

Free-text crash narratives recorded in real-world crash databases have been shown to play a significant role in improving traffic safety. However, large-scale analyses remain difficult to implement as there are no documented tools that can batch process the unstructured, non standardized text content written by various authors with diverse experience and attention to detail. In recent years, Transformer-based pre-trained language models (PLMs), such as Bidirectional Encoder Representations from Transformers (BERT) and large language models (LLMs), have demonstrated strong capabilities across various natural language processing tasks. These models can extract explicit facts from crash narratives, but their performance declines on inference-heavy tasks in, for example, Crash Type identification, which can involve nearly 100 categories. Moreover, relying on closed LLMs through external APIs raises privacy concerns for sensitive crash data. Additionally, these black-box tools often underperform due to limited domain knowledge. Motivated by these challenges, we study whether compact open-source PLMs can support reasoning-intensive extraction from crash narratives. We target two challenging objectives: 1) identifying the Manner of Collision for a crash, and 2) Crash Type for each vehicle involved in the crash event from real-world crash narratives. To bridge domain gaps, we apply fine-tuning techniques to inject task-specific knowledge to LLMs with Low-Rank Adaption (LoRA) and BERT. Experiments on the authoritative real-world dataset Crash Investigation Sampling System (CISS) demonstrate that our fine-tuned compact models outperform strong closed LLMs, such as GPT-4o, while requiring only minimal training resources. Further analysis reveals that the fine-tuned PLMs can capture richer narrative details and even correct some mislabeled annotations in the dataset.

Domain-Adapted Pre-trained Language Models for Implicit Information Extraction in Crash Narratives

TL;DR

The paper tackles extracting structured information from unstructured crash narratives, focusing on MANCOLL (7 classes) and per-vehicle CRASHTYPE (98 classes). It introduces a domain-adapted approach using LoRA-based fine-tuning on open-source PLMs and end-to-end BERT fine-tuning, framing CRASHTYPE with a 13-subtask decomposition guided by CRASHCONF. On the CISS dataset, compact PLMs achieve state-of-the-art performance, outperforming GPT-4o in many settings while requiring far fewer resources, and they can even correct mislabeled annotations in the data. The study demonstrates robust data annotation improvements, privacy-preserving deployment, and strong consistency across models, highlighting the practical impact for scalable, safe crash-narrative analysis and policy support.

Abstract

Free-text crash narratives recorded in real-world crash databases have been shown to play a significant role in improving traffic safety. However, large-scale analyses remain difficult to implement as there are no documented tools that can batch process the unstructured, non standardized text content written by various authors with diverse experience and attention to detail. In recent years, Transformer-based pre-trained language models (PLMs), such as Bidirectional Encoder Representations from Transformers (BERT) and large language models (LLMs), have demonstrated strong capabilities across various natural language processing tasks. These models can extract explicit facts from crash narratives, but their performance declines on inference-heavy tasks in, for example, Crash Type identification, which can involve nearly 100 categories. Moreover, relying on closed LLMs through external APIs raises privacy concerns for sensitive crash data. Additionally, these black-box tools often underperform due to limited domain knowledge. Motivated by these challenges, we study whether compact open-source PLMs can support reasoning-intensive extraction from crash narratives. We target two challenging objectives: 1) identifying the Manner of Collision for a crash, and 2) Crash Type for each vehicle involved in the crash event from real-world crash narratives. To bridge domain gaps, we apply fine-tuning techniques to inject task-specific knowledge to LLMs with Low-Rank Adaption (LoRA) and BERT. Experiments on the authoritative real-world dataset Crash Investigation Sampling System (CISS) demonstrate that our fine-tuned compact models outperform strong closed LLMs, such as GPT-4o, while requiring only minimal training resources. Further analysis reveals that the fine-tuned PLMs can capture richer narrative details and even correct some mislabeled annotations in the dataset.

Paper Structure

This paper contains 30 sections, 7 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: Hierarchical mapping of CRASHTYPE in a two-step classification procedure
  • Figure 2: Overview of the two information extraction tasks: (a) Manner of collision extracted from the crash narrative with a fixed CoT prompt and a small candidate set (7 classes). (b) Per-vehicle Crash Type extracted from an intertwined multi-vehicle narrative, using 13 task-specific prompts to predict among 98 fine-grained classes.
  • Figure 3: Overview of the proposed LoRA-based fine-tuning framework.
  • Figure 4: Prediction processes of the original LLM (top) and the LoRA fine-tuned LLM (bottom).
  • Figure 5: Accuracy (a) and Macro F1 (b) of LLaMA3-3B, BERT, and FastText under different noise ratios in the training data.
  • ...and 6 more figures