Table of Contents
Fetching ...

STAF: Leveraging LLMs for Automated Attack Tree-Based Security Test Generation

Tanmay Khule, Stefan Marksteiner, Jose Alguindigue, Hannes Fuchs, Sebastian Fischmeister, Apurva Narayan

TL;DR

STAF presents a Security Test Automation Framework that automates the generation of executable security test cases from automotive attack trees using a four-stage self-corrective Retrieval-Augmented Generation pipeline. By combining LLM-guided threat analysis, adaptive retrieval over automotive threat intelligence, and iterative test-case refinement, STAF produces structured Python tests and corresponding LTL properties when needed. Key contributions include a concrete four-stage workflow, integration with AVL ThreatGuard and automotive test libraries, and demonstrated improvements in alignment, completeness, and runnability over vanilla LLM approaches, demonstrated via a BMS case study. The work offers a scalable, domain-adapted approach to automating automotive security testing, bridging threat modeling and practical validation within existing testing ecosystems.

Abstract

In modern automotive development, security testing is critical for safeguarding systems against increasingly advanced threats. Attack trees are widely used to systematically represent potential attack vectors, but generating comprehensive test cases from these trees remains a labor-intensive, error-prone task that has seen limited automation in the context of testing vehicular systems. This paper introduces STAF (Security Test Automation Framework), a novel approach to automating security test case generation. Leveraging Large Language Models (LLMs) and a four-step self-corrective Retrieval-Augmented Generation (RAG) framework, STAF automates the generation of executable security test cases from attack trees, providing an end-to-end solution that encompasses the entire attack surface. We particularly show the elements and processes needed to provide an LLM to actually produce sensible and executable automotive security test suites, along with the integration with an automated testing framework. We further compare our tailored approach with general purpose (vanilla) LLMs and the performance of different LLMs (namely GPT-4.1 and DeepSeek) using our approach. We also demonstrate the method of our operation step-by-step in a concrete case study. Our results show significant improvements in efficiency, accuracy, scalability, and easy integration in any workflow, marking a substantial advancement in automating automotive security testing methodologies. Using TARAs as an input for verfication tests, we create synergies by connecting two vital elements of a secure automotive development process.

STAF: Leveraging LLMs for Automated Attack Tree-Based Security Test Generation

TL;DR

STAF presents a Security Test Automation Framework that automates the generation of executable security test cases from automotive attack trees using a four-stage self-corrective Retrieval-Augmented Generation pipeline. By combining LLM-guided threat analysis, adaptive retrieval over automotive threat intelligence, and iterative test-case refinement, STAF produces structured Python tests and corresponding LTL properties when needed. Key contributions include a concrete four-stage workflow, integration with AVL ThreatGuard and automotive test libraries, and demonstrated improvements in alignment, completeness, and runnability over vanilla LLM approaches, demonstrated via a BMS case study. The work offers a scalable, domain-adapted approach to automating automotive security testing, bridging threat modeling and practical validation within existing testing ecosystems.

Abstract

In modern automotive development, security testing is critical for safeguarding systems against increasingly advanced threats. Attack trees are widely used to systematically represent potential attack vectors, but generating comprehensive test cases from these trees remains a labor-intensive, error-prone task that has seen limited automation in the context of testing vehicular systems. This paper introduces STAF (Security Test Automation Framework), a novel approach to automating security test case generation. Leveraging Large Language Models (LLMs) and a four-step self-corrective Retrieval-Augmented Generation (RAG) framework, STAF automates the generation of executable security test cases from attack trees, providing an end-to-end solution that encompasses the entire attack surface. We particularly show the elements and processes needed to provide an LLM to actually produce sensible and executable automotive security test suites, along with the integration with an automated testing framework. We further compare our tailored approach with general purpose (vanilla) LLMs and the performance of different LLMs (namely GPT-4.1 and DeepSeek) using our approach. We also demonstrate the method of our operation step-by-step in a concrete case study. Our results show significant improvements in efficiency, accuracy, scalability, and easy integration in any workflow, marking a substantial advancement in automating automotive security testing methodologies. Using TARAs as an input for verfication tests, we create synergies by connecting two vital elements of a secure automotive development process.

Paper Structure

This paper contains 21 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Workflow of STAF's self-corrective information retrieval mechanism. This process ensures the relevance and timely updates of the knowledge base by combining vector data store retrieval with web queries when necessary, enhancing the accuracy of generated security test cases. If applicable protocol Mealy models in DOT format are provided in the initial test generation prompt.
  • Figure 2: Architecture of the Battery Management System used as system under test.