Table of Contents
Fetching ...

Model-oriented Graph Distances via Partially Ordered Sets

Armeen Taeb, F. Richard Guo, Leonard Henckel

TL;DR

The authors address how to define a meaningful distance between graphs across probabilistic and causal graph classes by treating each graph as a statistical model and organizing graphs into a poset under model inclusion. They define the model-oriented distance $d_{\\mathcal{L}}$ as the length of the shortest path in the Hasse diagram of the poset, proving it is a metric under suitable conditions and deriving class-specific characterizations. The framework is instantiated for four graph families (UGs, DAGs, CPDAGs, MPDAGs), with structural results (gradedness in some cases, non-gradedness in others), bounds, and efficient A* based algorithms for computation and bounding. The work highlights substantial differences from traditional SHD and SID, demonstrates tractable simplifications for polytrees, and provides practical algorithms to quantify semantic distances in both probabilistic and causal settings, enabling better benchmarking, consensus finding, and sensitivity analyses in structure learning.

Abstract

A well-defined distance on the parameter space is key to evaluating estimators, ensuring consistency, and building confidence sets. While there are typically standard distances to adopt in a continuous space, this is not the case for combinatorial parameters such as graphs that represent statistical models. Existing proposals like the structural Hamming distance are defined on the graphs rather than the models they represent and can hence lead to undesirable behaviors. We propose a model-oriented framework for defining the distance between graphs that is applicable across many different graph classes. Our approach treats each graph as a statistical model and organizes the graphs in a partially ordered set based on model inclusion. This induces a neighborhood structure, from which we define the model-oriented distance as the length of a shortest path through neighbors, yielding a metric in the space of graphs. We apply this framework to both probabilistic graphical models (e.g., undirected graphs and completed partially directed acyclic graphs) and causal graphical models (e.g., directed acyclic graphs and maximally oriented partially directed acyclic graphs). We analyze the theoretical and empirical behaviors of model-oriented distances. Algorithmic tools are also developed for computing and bounding these distances.

Model-oriented Graph Distances via Partially Ordered Sets

TL;DR

The authors address how to define a meaningful distance between graphs across probabilistic and causal graph classes by treating each graph as a statistical model and organizing graphs into a poset under model inclusion. They define the model-oriented distance as the length of the shortest path in the Hasse diagram of the poset, proving it is a metric under suitable conditions and deriving class-specific characterizations. The framework is instantiated for four graph families (UGs, DAGs, CPDAGs, MPDAGs), with structural results (gradedness in some cases, non-gradedness in others), bounds, and efficient A* based algorithms for computation and bounding. The work highlights substantial differences from traditional SHD and SID, demonstrates tractable simplifications for polytrees, and provides practical algorithms to quantify semantic distances in both probabilistic and causal settings, enabling better benchmarking, consensus finding, and sensitivity analyses in structure learning.

Abstract

A well-defined distance on the parameter space is key to evaluating estimators, ensuring consistency, and building confidence sets. While there are typically standard distances to adopt in a continuous space, this is not the case for combinatorial parameters such as graphs that represent statistical models. Existing proposals like the structural Hamming distance are defined on the graphs rather than the models they represent and can hence lead to undesirable behaviors. We propose a model-oriented framework for defining the distance between graphs that is applicable across many different graph classes. Our approach treats each graph as a statistical model and organizes the graphs in a partially ordered set based on model inclusion. This induces a neighborhood structure, from which we define the model-oriented distance as the length of a shortest path through neighbors, yielding a metric in the space of graphs. We apply this framework to both probabilistic graphical models (e.g., undirected graphs and completed partially directed acyclic graphs) and causal graphical models (e.g., directed acyclic graphs and maximally oriented partially directed acyclic graphs). We analyze the theoretical and empirical behaviors of model-oriented distances. Algorithmic tools are also developed for computing and bounding these distances.

Paper Structure

This paper contains 62 sections, 30 theorems, 49 equations, 12 figures, 1 table, 5 algorithms.

Key Result

Theorem 1

The model-oriented distance in def:d is a metric if and only if $(\mathfrak{G}, \mathcal{M})$ satisfies cond:injectcond:connect.

Figures (12)

  • Figure 1: (a–f) Four CPDAGs in \ref{['ex:SHD']} and their associated BIC scores for synthetically generated data.
  • Figure 2: (a) A CPDAG representing three Markov equivalent DAGs. A subset of two DAGs can be represented by an MPDAG. (b) Structures that are forbidden from a valid MPDAG.
  • Figure 3: Hasse diagrams for different statistical graphs over vertex set $\{1,2,3\}$: each box represents a graph, and each dashed arrow represents a covering relation, i.e., we draw $\mathcal{G}_1 \dashrightarrow \mathcal{G}_2$ pointing upwards if $\mathcal{G}_2$ covers $\mathcal{G}_1$. We can determine whether $\mathcal{G}_1 \preceq \mathcal{G}_2$ by checking whether there is a directed path from $\mathcal{G}_1$ to $\mathcal{G}_2$. In (b) and (d), only part of the poset is shown for simplicity. Each poset has a least element, but not all have a greatest element. The poset in panel (d) is not graded when the graphs have more than three vertices (see \ref{['prop:not_graded_mpdags']}). This means there is no consistent way to assign ranks to the graphs.
  • Figure 4: Upper and lower semimodularity of a poset. Between any $\mathcal{G}_s, \mathcal{G}_t$ in a lower semimodular poset, a shortest path can always be transformed into a down-up path of the same length.
  • Figure 5: The model-oriented distance between probabilistic CPDAGs $\mathcal{G}_s$ and $\mathcal{G}_t$ is 4, as shown by the shortest path in red. Meanwhile, the up-down distance is 8 (blue path) and the down-up distance is also 8 (orange path).
  • ...and 7 more figures

Theorems & Definitions (66)

  • Example 1
  • Definition 1: Neighbor
  • Definition 2: Model-oriented distance
  • Theorem 1: Metric
  • proof
  • Definition 3: Poset, cover, connectedness, least element, comparability
  • Definition 4: Model-oriented poset
  • Proposition 1
  • Proposition 2
  • proof
  • ...and 56 more