Evaluating the Effectiveness of GPT-4 Turbo in Creating Defeaters for Assurance Cases
Kimya Khakzad Shahandashti, Mithila Sivakumar, Mohammad Mahdi Mohajer, Alvine B. Belle, Song Wang, Timothy C. Lethbridge
TL;DR
This paper addresses the challenge of identifying defeaters in assurance cases (ACs) formalized with Eliminative Argumentation (EA). It proposes using GPT-4 Turbo to automatically identify defeaters within EA-based ACs, focusing on Phase I of a three-phase plan that includes rule extraction and proficiency assessment. The authors extract EA structural and semantic rules, design 22 assessment questions, and report that GPT-4 Turbo achieves an overall average rating of 1.40 with a Kendall correlation of 0.75 between raters, excelling in structural tasks and defeater generation but lagging on semantics. The findings suggest GPT-4 Turbo can bootstrap automated defeater analysis for ACs, enabling Phase II and Phase III work on defeater mitigation and formal validation in safety-critical industries.
Abstract
Assurance cases (ACs) are structured arguments that support the verification of the correct implementation of systems' non-functional requirements, such as safety and security, thereby preventing system failures which could lead to catastrophic outcomes, including loss of lives. ACs facilitate the certification of systems in accordance with industrial standards, for example, DO-178C and ISO 26262. Identifying defeaters arguments that refute these ACs is essential for improving the robustness and confidence in ACs. To automate this task, we introduce a novel method that leverages the capabilities of GPT-4 Turbo, an advanced Large Language Model (LLM) developed by OpenAI, to identify defeaters within ACs formalized using the Eliminative Argumentation (EA) notation. Our initial evaluation gauges the model's proficiency in understanding and generating arguments within this framework. The findings indicate that GPT-4 Turbo excels in EA notation and is capable of generating various types of defeaters.
