Efficient Tree Generation for Globally Optimal Decisions under Probabilistic Outcomes
Berk Ozturk, She'ifa Punla-Green, Les Servi
TL;DR
The paper tackles the problem of generating policies for sequential, uncertain decision problems with interdependent actions by constructing globally optimal decision trees. It introduces a methodology that combines dynamic programming with mixed-integer linear optimization to produce trees that maximize expected rewards while pruning vast portions of the state space through reward-driven pruning and dominance analysis. The approach comprises three stages: full graph generation with pruning, reduced graph construction via a Phi score recursion, and a tree selection step that optionally uses a secondary objective to minimize tree size; an LP formulation is also offered for validation and adversarial extensions. Computational results on randomized graphs demonstrate linear scaling in the number of explored states and substantial speed-ups over naive generation, highlighting the method’s practical scalability for large, structured problems. The work has potential impact across healthcare, wargaming, and cybersecurity by enabling globally optimal, interpretable decision policies under uncertainty without extensive parameter tuning, and it points toward future adversarial extensions and further state-space reduction techniques.
Abstract
Many real-world problems require making sequences of decisions where the outcomes of each decision are probabilistic and uncertain, and the availability of different actions is constrained by the outcomes of previous actions. There is a need to generate policies that are adaptive to uncertainty, globally optimal, and yet scalable as the state space grows. In this paper, we propose the generation of optimal decision trees, which dictate which actions should be implemented in different outcome scenarios, while maximizing the expected reward of the strategy. Using a combination of dynamic programming and mixed-integer linear optimization, the proposed methods scale to problems with large but finite state spaces, using problem-specific information to prune away large subsets of the state space that do not yield progress towards rewards. We demonstrate that the presented approach is able to find the globally optimal decision tree in linear time with respect to the number states explored.
