Model-oriented Graph Distances via Partially Ordered Sets
Armeen Taeb, F. Richard Guo, Leonard Henckel
TL;DR
The authors address how to define a meaningful distance between graphs across probabilistic and causal graph classes by treating each graph as a statistical model and organizing graphs into a poset under model inclusion. They define the model-oriented distance $d_{\\mathcal{L}}$ as the length of the shortest path in the Hasse diagram of the poset, proving it is a metric under suitable conditions and deriving class-specific characterizations. The framework is instantiated for four graph families (UGs, DAGs, CPDAGs, MPDAGs), with structural results (gradedness in some cases, non-gradedness in others), bounds, and efficient A* based algorithms for computation and bounding. The work highlights substantial differences from traditional SHD and SID, demonstrates tractable simplifications for polytrees, and provides practical algorithms to quantify semantic distances in both probabilistic and causal settings, enabling better benchmarking, consensus finding, and sensitivity analyses in structure learning.
Abstract
A well-defined distance on the parameter space is key to evaluating estimators, ensuring consistency, and building confidence sets. While there are typically standard distances to adopt in a continuous space, this is not the case for combinatorial parameters such as graphs that represent statistical models. Existing proposals like the structural Hamming distance are defined on the graphs rather than the models they represent and can hence lead to undesirable behaviors. We propose a model-oriented framework for defining the distance between graphs that is applicable across many different graph classes. Our approach treats each graph as a statistical model and organizes the graphs in a partially ordered set based on model inclusion. This induces a neighborhood structure, from which we define the model-oriented distance as the length of a shortest path through neighbors, yielding a metric in the space of graphs. We apply this framework to both probabilistic graphical models (e.g., undirected graphs and completed partially directed acyclic graphs) and causal graphical models (e.g., directed acyclic graphs and maximally oriented partially directed acyclic graphs). We analyze the theoretical and empirical behaviors of model-oriented distances. Algorithmic tools are also developed for computing and bounding these distances.
