Decision Predicate Graphs: Enhancing Interpretability in Tree Ensembles
Leonardo Arrighi, Luca Pennella, Gabriel Marques Tavares, Sylvio Barbon Junior
TL;DR
This paper proposes Decision Predicate Graphs (DPG), a model-agnostic graph-based tool to obtain global interpretability for tree-based ensembles by representing predicates (feature–value tests) as nodes and their co-occurrence frequencies as weighted edges. It formalizes the construction of DPG from a trained ensemble, provides a complexity analysis ($O(b \times s \times k^2)$) and pseudo-code, and introduces graph-theoretic metrics—betweenness centrality, local reaching centrality, and community detection—to quantify decision importance and class structure. Through Iris and a synthetic multiclass dataset, DPG demonstrates how these metrics reveal influential predicates, classify decision pathways, and identify class-specific communities, offering insights beyond traditional visualisations. The work compares DPG with ADD-based graph representations, highlighting advantages in weighting, global metrics, and scalability, and outlines potential improvements and extensions to regression problems, broader datasets, and additional interpretability tools. Overall, DPG enhances global interpretability of tree ensembles by integrating graph theory with predicate-path analysis, providing actionable insights while preserving model performance.
Abstract
Understanding the decisions of tree-based ensembles and their relationships is pivotal for machine learning model interpretation. Recent attempts to mitigate the human-in-the-loop interpretation challenge have explored the extraction of the decision structure underlying the model taking advantage of graph simplification and path emphasis. However, while these efforts enhance the visualisation experience, they may either result in a visually complex representation or compromise the interpretability of the original ensemble model. In addressing this challenge, especially in complex scenarios, we introduce the Decision Predicate Graph (DPG) as a model-agnostic tool to provide a global interpretation of the model. DPG is a graph structure that captures the tree-based ensemble model and learned dataset details, preserving the relations among features, logical decisions, and predictions towards emphasising insightful points. Leveraging well-known graph theory concepts, such as the notions of centrality and community, DPG offers additional quantitative insights into the model, complementing visualisation techniques, expanding the problem space descriptions, and offering diverse possibilities for extensions. Empirical experiments demonstrate the potential of DPG in addressing traditional benchmarks and complex classification scenarios.
