Automatic Instantiation of Assurance Cases from Patterns Using Large Language Models
Oluwafemi Odu, Alvine B. Belle, Song Wang, Segla Kpodjedo, Timothy C. Lethbridge, Hadi Hemmati
TL;DR
This work formalizes assurance case patterns (ACPs) into predicate-based rules aligned with the Goal Structuring Notation (GSN) and evaluates two large language models, GPT-4o and GPT-4 Turbo, on automatic AC instantiation from these patterns. Across six ACPs and five grounded ACs spanning aviation, automotive, medical, and computing domains, the study shows that LLMs can generate pattern-compliant ACs, with performance heavily benefiting from domain information, one-shot exemplars, and predicate-based guidance; however, pattern-specific nuances like multiplicity cardinality remain challenging. Quantitative results reveal that GPT-4o generally outperforms GPT-4 Turbo on semantic similarity and exact-match metrics in SE-informed prompts, though there are domain-dependent exceptions. The study concludes that while LLMs substantially reduce manual effort, a semi-automatic workflow with human refinement remains the most reliable path for certifiable assurance cases, and lays groundwork for extending automatic instantiation to broader software engineering artifacts.
Abstract
An assurance case is a structured set of arguments supported by evidence, demonstrating that a system's non-functional requirements (e.g., safety, security, reliability) have been correctly implemented. Assurance case patterns serve as templates derived from previous successful assurance cases, aimed at facilitating the creation of new assurance cases. Despite the use of these patterns to generate assurance cases, their instantiation remains a largely manual and error-prone process that heavily relies on domain expertise. Thus, exploring techniques to support their automatic instantiation becomes crucial. This study aims to investigate the potential of Large Language Models (LLMs) in automating the generation of assurance cases that comply with specific patterns. Specifically, we formalize assurance case patterns using predicate-based rules and then utilize LLMs, i.e., GPT-4o and GPT-4 Turbo, to automatically instantiate assurance cases from these formalized patterns. Our findings suggest that LLMs can generate assurance cases that comply with the given patterns. However, this study also highlights that LLMs may struggle with understanding some nuances related to pattern-specific relationships. While LLMs exhibit potential in the automatic generation of assurance cases, their capabilities still fall short compared to human experts. Therefore, a semi-automatic approach to instantiating assurance cases may be more practical at this time.
