Table of Contents
Fetching ...

Generative Modelling of Structurally Constrained Graphs

Manuel Madeira, Clement Vignac, Dorina Thanou, Pascal Frossard

TL;DR

ConStruct addresses the challenge of generating graphs that satisfy hard, domain-specific structural constraints by introducing a constrained graph discrete diffusion framework. It combines an edge-absorbing forward noise model with a property-preserving projector to ensure that both forward and reverse diffusion steps stay within a constrained graph class defined by edge-deletion invariants (e.g., planarity, acyclicity). The method demonstrates strong performance across synthetic datasets and digital pathology graphs, achieving near-perfect constraint validity and substantial gains in data plausibility (e.g., up to 71.1 percentage points in TLS-bearing cell graphs). Efficiency improvements via an edge-blocking hash table and incremental constraint checks keep sampling overhead modest, making constrained diffusion practical for real-world applications such as biomedical graph data augmentation and molecular design.

Abstract

Graph diffusion models have emerged as state-of-the-art techniques in graph generation; yet, integrating domain knowledge into these models remains challenging. Domain knowledge is particularly important in real-world scenarios, where invalid generated graphs hinder deployment in practical applications. Unconstrained and conditioned graph diffusion models fail to guarantee such domain-specific structural properties. We present ConStruct, a novel framework that enables graph diffusion models to incorporate hard constraints on specific properties, such as planarity or acyclicity. Our approach ensures that the sampled graphs remain within the domain of graphs that satisfy the specified property throughout the entire trajectory in both the forward and reverse processes. This is achieved by introducing an edge-absorbing noise model and a new projector operator. ConStruct demonstrates versatility across several structural and edge-deletion invariant constraints and achieves state-of-the-art performance for both synthetic benchmarks and attributed real-world datasets. For example, by incorporating planarity constraints in digital pathology graph datasets, the proposed method outperforms existing baselines, improving data validity by up to 71.1 percentage points.

Generative Modelling of Structurally Constrained Graphs

TL;DR

ConStruct addresses the challenge of generating graphs that satisfy hard, domain-specific structural constraints by introducing a constrained graph discrete diffusion framework. It combines an edge-absorbing forward noise model with a property-preserving projector to ensure that both forward and reverse diffusion steps stay within a constrained graph class defined by edge-deletion invariants (e.g., planarity, acyclicity). The method demonstrates strong performance across synthetic datasets and digital pathology graphs, achieving near-perfect constraint validity and substantial gains in data plausibility (e.g., up to 71.1 percentage points in TLS-bearing cell graphs). Efficiency improvements via an edge-blocking hash table and incremental constraint checks keep sampling overhead modest, making constrained diffusion practical for real-world applications such as biomedical graph data augmentation and molecular design.

Abstract

Graph diffusion models have emerged as state-of-the-art techniques in graph generation; yet, integrating domain knowledge into these models remains challenging. Domain knowledge is particularly important in real-world scenarios, where invalid generated graphs hinder deployment in practical applications. Unconstrained and conditioned graph diffusion models fail to guarantee such domain-specific structural properties. We present ConStruct, a novel framework that enables graph diffusion models to incorporate hard constraints on specific properties, such as planarity or acyclicity. Our approach ensures that the sampled graphs remain within the domain of graphs that satisfy the specified property throughout the entire trajectory in both the forward and reverse processes. This is achieved by introducing an edge-absorbing noise model and a new projector operator. ConStruct demonstrates versatility across several structural and edge-deletion invariant constraints and achieves state-of-the-art performance for both synthetic benchmarks and attributed real-world datasets. For example, by incorporating planarity constraints in digital pathology graph datasets, the proposed method outperforms existing baselines, improving data validity by up to 71.1 percentage points.

Paper Structure

This paper contains 59 sections, 3 theorems, 21 equations, 12 figures, 11 tables, 4 algorithms.

Key Result

Theorem 1

(Simplified) Let $\mathcal{G}^{t-1} = \operatorname{Projector}(P, \hat{G}^{t-1}, G^t)$ be the set of all possible one-step denoised graphs outputted by ConStruct. If we define $G^*$ as any optimal solution of: where $\mathcal{C}= \{ G \in \mathcal{G}| P(G) = True, G \supset G^t \}$ and $\mathcal{G}$ is the set of all unattributed graphs, then $G^*$ can be recovered by our projector, i.e., $G^* \i

Figures (12)

  • Figure 1: Constrained graph discrete diffusion framework. The forward process consists of an edge deletion process driven by the edge-absorbing noise model, while the node types may switch according to the marginal noise model. At sampling time, the projector operator ensures that sampled graphs remain within the constrained domain throughout the entire reverse process. In the illustrated example, the constrained domain consists exclusively of graphs with no cycles. We highlight in gray the components responsible for preserving the constraining property.
  • Figure 2: Projector operator. At each iteration, we start by sampling a candidate graph $\hat{G}^{t-1}$ from the distribution $p_\theta(G^{t-1} | G^t)$ provided by the diffusion model. Then, the projector step inserts in an uniformly random manner the candidate edges, discarding those that violate the target property, $P$, i.e., acyclicity in this illustration. In the end of the reverse step, we find a graph $G^{t-1}$ that is guaranteed to comply with such property.
  • Figure 3: Examples of different $G^{t-1}$ that can be yielded by $\operatorname{Projector}(P, \hat{G}^{t-1}, G^t)$ for given $P$ (column "Constraint"), $G^t$, and $\hat{G}^{t-1}$ (columns with the respective name) that lead to the insertion of a different number of edges. For the maximum degree row, the example given considers that the maximum allowed degree is 2. For the column $\hat{G}^{t-1}$, the dashed lines represent the candidate edges. For the column $G^{t-1}$, the green lines denote the actually inserted edges by the projector.
  • Figure 4: Extraction of a cell subgraph (center) from a WSI graph (left). From this cell subgraph, we can then compute the TLS embedding based on the classification of the edges into different categories, shown on the right. We can observe a cluster of B-cells surrounded by some support T-cells, characteristic of a high TLS content.
  • Figure 5: Distributions of the TLS embedding entries for the low TLS (left) and the high TLS (right) datasets.
  • ...and 7 more figures

Theorems & Definitions (7)

  • Definition 3.1
  • Theorem 1
  • Definition B.1
  • Theorem 1
  • proof
  • Theorem 2
  • proof