Generative AI in Systems Engineering: A Framework for Risk Assessment of Large Language Models

Stefan Otten; Philipp Reis; Philipp Rigoll; Joshua Ransiek; Tobias Schürmann; Jacob Langner; Eric Sax

Generative AI in Systems Engineering: A Framework for Risk Assessment of Large Language Models

Stefan Otten, Philipp Reis, Philipp Rigoll, Joshua Ransiek, Tobias Schürmann, Jacob Langner, Eric Sax

TL;DR

The paper tackles the difficulty of risk governance for LLMs in systems engineering by proposing the LLM Risk Assessment Framework (LRF), a two-dimensional classification that maps autonomy levels (0 to 3) and potential impact (Low to High) to corresponding risk levels and validation needs. By grounding AI deployment in established safety-like principles, it enables consistent, transparent oversight across the engineering lifecycle and supports scalable, risk-aware adoption of generative AI tools. Contributions include formalizing the LRF, linking it to traditional standards (e.g., IEC 61508, ISO 26262), and illustrating its use with concrete examples like a Requirements Checker and Legal Case Assessment. The framework aims to harmonize rapid AI evolution with reliability, traceability, and control in SE, while outlining future work on maturity metrics and standardization for AI assurance.

Abstract

The increasing use of Large Language Models (LLMs) offers significant opportunities across the engineering lifecycle, including requirements engineering, software development, process optimization, and decision support. Despite this potential, organizations face substantial challenges in assessing the risks associated with LLM use, resulting in inconsistent integration, unknown failure modes, and limited scalability. This paper introduces the LLM Risk Assessment Framework (LRF), a structured approach for evaluating the application of LLMs within Systems Engineering (SE) environments. The framework classifies LLM-based applications along two fundamental dimensions: autonomy, ranging from supportive assistance to fully automated decision making, and impact, reflecting the potential severity of incorrect or misleading model outputs on engineering processes and system elements. By combining these dimensions, the LRF enables consistent determination of corresponding risk levels across the development lifecycle. The resulting classification supports organizations in identifying appropriate validation strategies, levels of human oversight, and required countermeasures to ensure safe and transparent deployment. The framework thereby helps align the rapid evolution of AI technologies with established engineering principles of reliability, traceability, and controlled process integration. Overall, the LRF provides a basis for risk-aware adoption of LLMs in complex engineering environments and represents a first step toward standardized AI assurance practices in systems engineering.

Generative AI in Systems Engineering: A Framework for Risk Assessment of Large Language Models

TL;DR

Abstract

Generative AI in Systems Engineering: A Framework for Risk Assessment of Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (4)