Using Causal Inference to Test Systems with Hidden and Interacting Variables: An Evaluative Case Study
Michael Foster, Robert M. Hierons, Donghwan Shin, Neil Walkinshaw, Christopher Wild
TL;DR
The paper tackles the problem of testing complex, nondeterministic software with hidden and interacting variables by extending causal testing with effect modification and instrumental variable methods. It formulates a DAG-based testing framework, uses observational data, and defines causal test cases to evaluate properties framed as treatment effects, while addressing interactions and unobservables. Through an evaluative CARLA case study, it demonstrates that incorporating interaction terms and IV approaches yields reliable test outcomes under limited, uncontrolled data, and can reveal faults that traditional methods miss. The work advances practical causal testing in software engineering, showing that controlled data collection is not always necessary and that expert-crafted causal models can guide robust test judgments, with implications for validation of ADS and other high-cost, nondeterministic systems.
Abstract
Software systems with large parameter spaces, nondeterminism and high computational cost are challenging to test. Recently, software testing techniques based on causal inference have been successfully applied to systems that exhibit such characteristics, including scientific models and autonomous driving systems. One significant limitation is that these are restricted to test properties where all of the variables involved can be observed and where there are no interactions between variables. In practice, this is rarely guaranteed; the logging infrastructure may not be available to record all of the necessary runtime variable values, and it can often be the case that an output of the system can be affected by complex interactions between variables. To address this, we leverage two additional concepts from causal inference, namely effect modification and instrumental variable methods. We build these concepts into an existing causal testing tool and conduct an evaluative case study which uses the concepts to test three system-level requirements of CARLA, a high-fidelity driving simulator widely used in autonomous vehicle development and testing. The results show that we can obtain reliable test outcomes without requiring large amounts of highly controlled test data or instrumentation of the code, even when variables interact with each other and are not recorded in the test data.
