Evaluating the Effectiveness of GPT-4 Turbo in Creating Defeaters for Assurance Cases

Kimya Khakzad Shahandashti; Mithila Sivakumar; Mohammad Mahdi Mohajer; Alvine B. Belle; Song Wang; Timothy C. Lethbridge

Evaluating the Effectiveness of GPT-4 Turbo in Creating Defeaters for Assurance Cases

Kimya Khakzad Shahandashti, Mithila Sivakumar, Mohammad Mahdi Mohajer, Alvine B. Belle, Song Wang, Timothy C. Lethbridge

TL;DR

This paper addresses the challenge of identifying defeaters in assurance cases (ACs) formalized with Eliminative Argumentation (EA). It proposes using GPT-4 Turbo to automatically identify defeaters within EA-based ACs, focusing on Phase I of a three-phase plan that includes rule extraction and proficiency assessment. The authors extract EA structural and semantic rules, design 22 assessment questions, and report that GPT-4 Turbo achieves an overall average rating of 1.40 with a Kendall correlation of 0.75 between raters, excelling in structural tasks and defeater generation but lagging on semantics. The findings suggest GPT-4 Turbo can bootstrap automated defeater analysis for ACs, enabling Phase II and Phase III work on defeater mitigation and formal validation in safety-critical industries.

Abstract

Assurance cases (ACs) are structured arguments that support the verification of the correct implementation of systems' non-functional requirements, such as safety and security, thereby preventing system failures which could lead to catastrophic outcomes, including loss of lives. ACs facilitate the certification of systems in accordance with industrial standards, for example, DO-178C and ISO 26262. Identifying defeaters arguments that refute these ACs is essential for improving the robustness and confidence in ACs. To automate this task, we introduce a novel method that leverages the capabilities of GPT-4 Turbo, an advanced Large Language Model (LLM) developed by OpenAI, to identify defeaters within ACs formalized using the Eliminative Argumentation (EA) notation. Our initial evaluation gauges the model's proficiency in understanding and generating arguments within this framework. The findings indicate that GPT-4 Turbo excels in EA notation and is capable of generating various types of defeaters.

Evaluating the Effectiveness of GPT-4 Turbo in Creating Defeaters for Assurance Cases

TL;DR

Abstract

Evaluating the Effectiveness of GPT-4 Turbo in Creating Defeaters for Assurance Cases

Authors

TL;DR

Abstract

Table of Contents

Figures (2)