CoDefeater: Using LLMs To Find Defeaters in Assurance Cases

Usman Gohar; Michael C. Hunter; Robyn R. Lutz; Myra B. Cohen

CoDefeater: Using LLMs To Find Defeaters in Assurance Cases

Usman Gohar, Michael C. Hunter, Robyn R. Lutz, Myra B. Cohen

TL;DR

Assurance cases for safety-critical systems can be undermined by undetected defeaters. CoDefeater uses LLMs (GPT-3.5) in a zero-shot setting to automatically identify defeaters in two real-world assurance cases, supporting a human-in-the-loop workflow. The study provides empirical evidence that LLMs can identify most ground-truth defeaters and generate novel, feasible defeaters, and it contributes a defeater-rich assurance fragment for further research. These findings suggest a practical path to improve the completeness, soundness, and confidence of assurance cases and accelerate safety certification processes.

Abstract

Constructing assurance cases is a widely used, and sometimes required, process toward demonstrating that safety-critical systems will operate safely in their planned environment. To mitigate the risk of errors and missing edge cases, the concept of defeaters - arguments or evidence that challenge claims in an assurance case - has been introduced. Defeaters can provide timely detection of weaknesses in the arguments, prompting further investigation and timely mitigations. However, capturing defeaters relies on expert judgment, experience, and creativity and must be done iteratively due to evolving requirements and regulations. This paper proposes CoDefeater, an automated process to leverage large language models (LLMs) for finding defeaters. Initial results on two systems show that LLMs can efficiently find known and unforeseen feasible defeaters to support safety analysts in enhancing the completeness and confidence of assurance cases.

CoDefeater: Using LLMs To Find Defeaters in Assurance Cases

TL;DR

Abstract

Paper Structure (12 sections, 5 figures, 1 table)

This paper contains 12 sections, 5 figures, 1 table.

Introduction
Background and Related Work
Methodology
Experimental Setup
Prompt Design
Evaluation Criteria
Threats to Validity
Results
(RQ1): Effectiveness in Identifying Defeaters
(RQ2): Utility in Generating Novel Defeaters
Discussion
Conclusion and Future Work

Figures (5)

Figure 1: Overview of CoDefeater.
Figure 2: An Assurance Case fragment with three example defeaters for Claim 1.2.
Figure 3: The system prompt used in the study.
Figure 4: A sample claim from the LHC assurance case, together with the claim's ground-truth defeaters (left) and LLM-generated (ChatGPT) defeaters (right), color-coded to represent the level of agreement between the two: complete match (green), partial match (blue), and no match (no color).
Figure 5: (Performance). Distribution of defeaters across coding categories. Cohen's kappa showed almost perfect agreement beyond chance.

CoDefeater: Using LLMs To Find Defeaters in Assurance Cases

TL;DR

Abstract

CoDefeater: Using LLMs To Find Defeaters in Assurance Cases

Authors

TL;DR

Abstract

Table of Contents

Figures (5)