Engineering Safety Requirements for Autonomous Driving with Large Language Models
Ali Nouri, Beatriz Cabrero-Daniel, Fredrik Törner, Hȧkan Sivencrona, Christian Berger
TL;DR
The paper tackles the challenge of engineering safety requirements for autonomous driving in the presence of evolving standards by designing an LLM-based prototype to generate and refine Hazard Analysis and Risk Assessment (HARA) outputs. Using Design Science methodology, it conducts three iterative cycles—one design and two engineering cycles—with internal and external expert evaluations, including a real-case case study with an in-house LLM. The key contributions are a task-based automation pipeline for HARA, prompt-engineering strategies to improve explainability and granularity, and empirical insights from expert feedback on safety goals and potential limitations such as hallucinations and data dependence. The work demonstrates that, with careful human oversight and iterative refinement, LLM-driven HARA can substantially speed up safety requirement generation while maintaining reviewer-centered quality, offering practical implications for DevOps-enabled safety workflows in automotive engineering.
Abstract
Changes and updates in the requirement artifacts, which can be frequent in the automotive domain, are a challenge for SafetyOps. Large Language Models (LLMs), with their impressive natural language understanding and generating capabilities, can play a key role in automatically refining and decomposing requirements after each update. In this study, we propose a prototype of a pipeline of prompts and LLMs that receives an item definition and outputs solutions in the form of safety requirements. This pipeline also performs a review of the requirement dataset and identifies redundant or contradictory requirements. We first identified the necessary characteristics for performing HARA and then defined tests to assess an LLM's capability in meeting these criteria. We used design science with multiple iterations and let experts from different companies evaluate each cycle quantitatively and qualitatively. Finally, the prototype was implemented at a case company and the responsible team evaluated its efficiency.
