Table of Contents
Fetching ...

Engineering Safety Requirements for Autonomous Driving with Large Language Models

Ali Nouri, Beatriz Cabrero-Daniel, Fredrik Törner, Hȧkan Sivencrona, Christian Berger

TL;DR

The paper tackles the challenge of engineering safety requirements for autonomous driving in the presence of evolving standards by designing an LLM-based prototype to generate and refine Hazard Analysis and Risk Assessment (HARA) outputs. Using Design Science methodology, it conducts three iterative cycles—one design and two engineering cycles—with internal and external expert evaluations, including a real-case case study with an in-house LLM. The key contributions are a task-based automation pipeline for HARA, prompt-engineering strategies to improve explainability and granularity, and empirical insights from expert feedback on safety goals and potential limitations such as hallucinations and data dependence. The work demonstrates that, with careful human oversight and iterative refinement, LLM-driven HARA can substantially speed up safety requirement generation while maintaining reviewer-centered quality, offering practical implications for DevOps-enabled safety workflows in automotive engineering.

Abstract

Changes and updates in the requirement artifacts, which can be frequent in the automotive domain, are a challenge for SafetyOps. Large Language Models (LLMs), with their impressive natural language understanding and generating capabilities, can play a key role in automatically refining and decomposing requirements after each update. In this study, we propose a prototype of a pipeline of prompts and LLMs that receives an item definition and outputs solutions in the form of safety requirements. This pipeline also performs a review of the requirement dataset and identifies redundant or contradictory requirements. We first identified the necessary characteristics for performing HARA and then defined tests to assess an LLM's capability in meeting these criteria. We used design science with multiple iterations and let experts from different companies evaluate each cycle quantitatively and qualitatively. Finally, the prototype was implemented at a case company and the responsible team evaluated its efficiency.

Engineering Safety Requirements for Autonomous Driving with Large Language Models

TL;DR

The paper tackles the challenge of engineering safety requirements for autonomous driving in the presence of evolving standards by designing an LLM-based prototype to generate and refine Hazard Analysis and Risk Assessment (HARA) outputs. Using Design Science methodology, it conducts three iterative cycles—one design and two engineering cycles—with internal and external expert evaluations, including a real-case case study with an in-house LLM. The key contributions are a task-based automation pipeline for HARA, prompt-engineering strategies to improve explainability and granularity, and empirical insights from expert feedback on safety goals and potential limitations such as hallucinations and data dependence. The work demonstrates that, with careful human oversight and iterative refinement, LLM-driven HARA can substantially speed up safety requirement generation while maintaining reviewer-centered quality, offering practical implications for DevOps-enabled safety workflows in automotive engineering.

Abstract

Changes and updates in the requirement artifacts, which can be frequent in the automotive domain, are a challenge for SafetyOps. Large Language Models (LLMs), with their impressive natural language understanding and generating capabilities, can play a key role in automatically refining and decomposing requirements after each update. In this study, we propose a prototype of a pipeline of prompts and LLMs that receives an item definition and outputs solutions in the form of safety requirements. This pipeline also performs a review of the requirement dataset and identifies redundant or contradictory requirements. We first identified the necessary characteristics for performing HARA and then defined tests to assess an LLM's capability in meeting these criteria. We used design science with multiple iterations and let experts from different companies evaluate each cycle quantitatively and qualitatively. Finally, the prototype was implemented at a case company and the responsible team evaluated its efficiency.
Paper Structure (27 sections, 4 figures, 1 table)

This paper contains 27 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: The design and engineering cycles used for our study: The dotted blue arrow is showing the design cycles, which the first one is reported (No. 1). The green arrow is showing the engineering cycles, which is done in two cycles and the results are reported (No. 2&3).
  • Figure 2: Figure 2: The first three hazards in two generated HARAs, showing the LLM's prior knowledge about AEB in contrast to CAEM.
  • Figure 3: Final version of the pipeline for the LLM-based tool for HARA: The generations are stored and transmitted between sub-tasks automatically using a Python script, without human intervention. In the end, the specified safety requirements are provided in a human-readable table for expert review.
  • Figure 4: Average expert scores for HARA criteria. Criteria are ordered to display the highest scores at the top and the lowest at the bottom, highlighting the relative scores.