Testing learning-enabled cyber-physical systems with Large-Language Models: A Formal Approach

Xi Zheng; Aloysius K. Mok; Ruzica Piskac; Yong Jae Lee; Bhaskar Krishnamachari; Dakai Zhu; Oleg Sokolsky; Insup Lee

Testing learning-enabled cyber-physical systems with Large-Language Models: A Formal Approach

Xi Zheng, Aloysius K. Mok, Ruzica Piskac, Yong Jae Lee, Bhaskar Krishnamachari, Dakai Zhu, Oleg Sokolsky, Insup Lee

TL;DR

The paper tackles the challenge of formally verifying learning-enabled CPS, where data-driven components complicate traditional SDLC approaches. It proposes a three-stage roadmap: (1) Multi-Modal LLM-based Test Generation to harvest diverse, realistic test scenarios from rules and sensor data; (2) Data-Driven Model Learning via $L^*$ with a simulator oracle and SAT-based refinements to extract accurate system models; and (3) Model-Based Testing using $STL$-based properties and coverage-guided fuzzing to produce formally meaningful counterexamples. Early results show that LLM-driven DSLs can convert regulatory rules into actionable CARLA test cases and that multi-modal analyses can reveal accident causes and generate meaningful corner cases. Together, these contributions offer a viable path to bridging probabilistic testing with formal assurance for safety-critical CPS, potentially enabling formal guarantees in real-world autonomous systems and robotics applications.

Abstract

The integration of machine learning (ML) into cyber-physical systems (CPS) offers significant benefits, including enhanced efficiency, predictive capabilities, real-time responsiveness, and the enabling of autonomous operations. This convergence has accelerated the development and deployment of a range of real-world applications, such as autonomous vehicles, delivery drones, service robots, and telemedicine procedures. However, the software development life cycle (SDLC) for AI-infused CPS diverges significantly from traditional approaches, featuring data and learning as two critical components. Existing verification and validation techniques are often inadequate for these new paradigms. In this study, we pinpoint the main challenges in ensuring formal safety for learningenabled CPS.We begin by examining testing as the most pragmatic method for verification and validation, summarizing the current state-of-the-art methodologies. Recognizing the limitations in current testing approaches to provide formal safety guarantees, we propose a roadmap to transition from foundational probabilistic testing to a more rigorous approach capable of delivering formal assurance.

Testing learning-enabled cyber-physical systems with Large-Language Models: A Formal Approach

TL;DR

with a simulator oracle and SAT-based refinements to extract accurate system models; and (3) Model-Based Testing using

-based properties and coverage-guided fuzzing to produce formally meaningful counterexamples. Early results show that LLM-driven DSLs can convert regulatory rules into actionable CARLA test cases and that multi-modal analyses can reveal accident causes and generate meaningful corner cases. Together, these contributions offer a viable path to bridging probabilistic testing with formal assurance for safety-critical CPS, potentially enabling formal guarantees in real-world autonomous systems and robotics applications.

Abstract

Paper Structure (10 sections, 3 figures)

This paper contains 10 sections, 3 figures.

CONTEXT, MOTIVATIONS AND AIMS
RELATED WORK
A ROADMAP TOWARDS FORMAL TESTING
Multi-modal LLM based Test Generation
Data-driven Model Learning
Model-based Testing
EARLY RESULTS AND EFFORTS
RQ1: Test Case Generation Capability using LLM
RQ2: Accident Root Cause Analysis using Multi-Modal LLM
Conclusion

Figures (3)

Figure 1: Proposed Roadmap
Figure 2: High-level structure of the Scenario DSL
Figure 3: A single frame taken from a traffic collision video

Testing learning-enabled cyber-physical systems with Large-Language Models: A Formal Approach

TL;DR

Abstract

Testing learning-enabled cyber-physical systems with Large-Language Models: A Formal Approach

Authors

TL;DR

Abstract

Table of Contents

Figures (3)