Formal Quality Measures for Predictors in Markov Decision Processes
Christel Baier, Sascha Klüppelholz, Jakob Piribauer, Robin Ziemek
TL;DR
This paper addresses how to quantify the quality of predictors for undesired events in systems modeled as Markov decision processes. It introduces two complementary approaches: (i) average-case quality measures obtained by uniformly sampling memoryless randomized policies ($MR$) and evaluating classifier-like metrics such as precision, recall, and $F$-score, and (ii) a causal-volume viewpoint using probability-raising (PR) policies to quantify how often a predictor causally raises the probability of the event via SPR and GPR notions. The authors formalize the MR policy space as a product of simplices, implement two-copy constructions to express confusion-matrix entries, and enable Monte Carlo or linear/quadratic programming techniques to compute averages and decision problems. They establish complexity results (SPR in P, GPR in NP) and provide practical guidance for estimating averages and causal volumes, illustrating the framework with network-like MDPs. Overall, the work offers a rigorous, dual-angle framework for evaluating predictors in probabilistic, non-deterministic control settings with potential impact on the reliability of adaptive AI systems.
Abstract
In adaptive systems, predictors are used to anticipate changes in the systems state or behavior that may require system adaption, e.g., changing its configuration or adjusting resource allocation. Therefore, the quality of predictors is crucial for the overall reliability and performance of the system under control. This paper studies predictors in systems exhibiting probabilistic and non-deterministic behavior modelled as Markov decision processes (MDPs). Main contributions are the introduction of quantitative notions that measure the effectiveness of predictors in terms of their average capability to predict the occurrence of failures or other undesired system behaviors. The average is taken over all memoryless policies. We study two classes of such notions. One class is inspired by concepts that have been introduced in statistical analysis to explain the impact of features on the decisions of binary classifiers (such as precision, recall, f-score). Second, we study a measure that borrows ideas from recent work on probability-raising causality in MDPs and determines the quality of a predictor by the fraction of memoryless policies under which (the set of states in) the predictor is a probability-raising cause for the considered failure scenario.
