Table of Contents
Fetching ...

Automatic Instantiation of Assurance Cases from Patterns Using Large Language Models

Oluwafemi Odu, Alvine B. Belle, Song Wang, Segla Kpodjedo, Timothy C. Lethbridge, Hadi Hemmati

TL;DR

This work formalizes assurance case patterns (ACPs) into predicate-based rules aligned with the Goal Structuring Notation (GSN) and evaluates two large language models, GPT-4o and GPT-4 Turbo, on automatic AC instantiation from these patterns. Across six ACPs and five grounded ACs spanning aviation, automotive, medical, and computing domains, the study shows that LLMs can generate pattern-compliant ACs, with performance heavily benefiting from domain information, one-shot exemplars, and predicate-based guidance; however, pattern-specific nuances like multiplicity cardinality remain challenging. Quantitative results reveal that GPT-4o generally outperforms GPT-4 Turbo on semantic similarity and exact-match metrics in SE-informed prompts, though there are domain-dependent exceptions. The study concludes that while LLMs substantially reduce manual effort, a semi-automatic workflow with human refinement remains the most reliable path for certifiable assurance cases, and lays groundwork for extending automatic instantiation to broader software engineering artifacts.

Abstract

An assurance case is a structured set of arguments supported by evidence, demonstrating that a system's non-functional requirements (e.g., safety, security, reliability) have been correctly implemented. Assurance case patterns serve as templates derived from previous successful assurance cases, aimed at facilitating the creation of new assurance cases. Despite the use of these patterns to generate assurance cases, their instantiation remains a largely manual and error-prone process that heavily relies on domain expertise. Thus, exploring techniques to support their automatic instantiation becomes crucial. This study aims to investigate the potential of Large Language Models (LLMs) in automating the generation of assurance cases that comply with specific patterns. Specifically, we formalize assurance case patterns using predicate-based rules and then utilize LLMs, i.e., GPT-4o and GPT-4 Turbo, to automatically instantiate assurance cases from these formalized patterns. Our findings suggest that LLMs can generate assurance cases that comply with the given patterns. However, this study also highlights that LLMs may struggle with understanding some nuances related to pattern-specific relationships. While LLMs exhibit potential in the automatic generation of assurance cases, their capabilities still fall short compared to human experts. Therefore, a semi-automatic approach to instantiating assurance cases may be more practical at this time.

Automatic Instantiation of Assurance Cases from Patterns Using Large Language Models

TL;DR

This work formalizes assurance case patterns (ACPs) into predicate-based rules aligned with the Goal Structuring Notation (GSN) and evaluates two large language models, GPT-4o and GPT-4 Turbo, on automatic AC instantiation from these patterns. Across six ACPs and five grounded ACs spanning aviation, automotive, medical, and computing domains, the study shows that LLMs can generate pattern-compliant ACs, with performance heavily benefiting from domain information, one-shot exemplars, and predicate-based guidance; however, pattern-specific nuances like multiplicity cardinality remain challenging. Quantitative results reveal that GPT-4o generally outperforms GPT-4 Turbo on semantic similarity and exact-match metrics in SE-informed prompts, though there are domain-dependent exceptions. The study concludes that while LLMs substantially reduce manual effort, a semi-automatic workflow with human refinement remains the most reliable path for certifiable assurance cases, and lays groundwork for extending automatic instantiation to broader software engineering artifacts.

Abstract

An assurance case is a structured set of arguments supported by evidence, demonstrating that a system's non-functional requirements (e.g., safety, security, reliability) have been correctly implemented. Assurance case patterns serve as templates derived from previous successful assurance cases, aimed at facilitating the creation of new assurance cases. Despite the use of these patterns to generate assurance cases, their instantiation remains a largely manual and error-prone process that heavily relies on domain expertise. Thus, exploring techniques to support their automatic instantiation becomes crucial. This study aims to investigate the potential of Large Language Models (LLMs) in automating the generation of assurance cases that comply with specific patterns. Specifically, we formalize assurance case patterns using predicate-based rules and then utilize LLMs, i.e., GPT-4o and GPT-4 Turbo, to automatically instantiate assurance cases from these formalized patterns. Our findings suggest that LLMs can generate assurance cases that comply with the given patterns. However, this study also highlights that LLMs may struggle with understanding some nuances related to pattern-specific relationships. While LLMs exhibit potential in the automatic generation of assurance cases, their capabilities still fall short compared to human experts. Therefore, a semi-automatic approach to instantiating assurance cases may be more practical at this time.
Paper Structure (76 sections, 11 figures, 10 tables)

This paper contains 76 sections, 11 figures, 10 tables.

Figures (11)

  • Figure 1: On the right, an example of a partial safety case (GSN diagram) adapted from b106; on the left, the equivalent of the safety case in structured prose
  • Figure 2: A sample assurance case pattern adapted from b32
  • Figure 3: High-level overview of our approach
  • Figure 4: A Simple Predicate-based Representation of the Assurance Case Pattern depicted in Figure \ref{['ACAS_XU_ACP']} (see appendix).
  • Figure 7: Generic structure of our System Prompts for Experiments with SE Knowledge
  • ...and 6 more figures