Table of Contents
Fetching ...

Combinatorial Complex Score-based Diffusion Modelling through Stochastic Differential Equations

Adrien Carrel

TL;DR

This work tackles the challenge of generative modeling for complex topologies by introducing Combinatorial Complex Score-based Diffusion (CCSD), a unified framework that treats generation through stochastic differential equations to produce combinatorial complexes (CCs) rather than only graphs. CCSD extends score-based diffusion to higher-order topologies via lifting procedures (loop-based and path-based) that map graphs into CCs, enabling the generation of hypergraphs and simplicial-like structures while preserving higher-order relations. The framework defines forward diffusion on CC components, learns partial score functions with dedicated neural architectures, and uses reverse-time SDEs or probability-flow ODEs for sampling, including conditional sampling and imputation. The authors provide a theoretical basis for CC representations (Dimension-Constrained CCs and FCCs), present novel CC-specific score networks and Hodge-based metrics, and deliver a Python library to train and sample CCs. Empirically, CCSD achieves competitive performance on molecule and graph generation tasks and demonstrates promising capabilities for higher-order objects, supported by a public software package and extensive evaluation metrics tailored to CCs.

Abstract

Graph structures offer a versatile framework for representing diverse patterns in nature and complex systems, applicable across domains like molecular chemistry, social networks, and transportation systems. While diffusion models have excelled in generating various objects, generating graphs remains challenging. This thesis explores the potential of score-based generative models in generating such objects through a modelization as combinatorial complexes, which are powerful topological structures that encompass higher-order relationships. In this thesis, we propose a unified framework by employing stochastic differential equations. We not only generalize the generation of complex objects such as graphs and hypergraphs, but we also unify existing generative modelling approaches such as Score Matching with Langevin dynamics and Denoising Diffusion Probabilistic Models. This innovation overcomes limitations in existing frameworks that focus solely on graph generation, opening up new possibilities in generative AI. The experiment results showed that our framework could generate these complex objects, and could also compete against state-of-the-art approaches for mere graph and molecule generation tasks.

Combinatorial Complex Score-based Diffusion Modelling through Stochastic Differential Equations

TL;DR

This work tackles the challenge of generative modeling for complex topologies by introducing Combinatorial Complex Score-based Diffusion (CCSD), a unified framework that treats generation through stochastic differential equations to produce combinatorial complexes (CCs) rather than only graphs. CCSD extends score-based diffusion to higher-order topologies via lifting procedures (loop-based and path-based) that map graphs into CCs, enabling the generation of hypergraphs and simplicial-like structures while preserving higher-order relations. The framework defines forward diffusion on CC components, learns partial score functions with dedicated neural architectures, and uses reverse-time SDEs or probability-flow ODEs for sampling, including conditional sampling and imputation. The authors provide a theoretical basis for CC representations (Dimension-Constrained CCs and FCCs), present novel CC-specific score networks and Hodge-based metrics, and deliver a Python library to train and sample CCs. Empirically, CCSD achieves competitive performance on molecule and graph generation tasks and demonstrates promising capabilities for higher-order objects, supported by a public software package and extensive evaluation metrics tailored to CCs.

Abstract

Graph structures offer a versatile framework for representing diverse patterns in nature and complex systems, applicable across domains like molecular chemistry, social networks, and transportation systems. While diffusion models have excelled in generating various objects, generating graphs remains challenging. This thesis explores the potential of score-based generative models in generating such objects through a modelization as combinatorial complexes, which are powerful topological structures that encompass higher-order relationships. In this thesis, we propose a unified framework by employing stochastic differential equations. We not only generalize the generation of complex objects such as graphs and hypergraphs, but we also unify existing generative modelling approaches such as Score Matching with Langevin dynamics and Denoising Diffusion Probabilistic Models. This innovation overcomes limitations in existing frameworks that focus solely on graph generation, opening up new possibilities in generative AI. The experiment results showed that our framework could generate these complex objects, and could also compete against state-of-the-art approaches for mere graph and molecule generation tasks.
Paper Structure (64 sections, 4 theorems, 7 equations, 30 figures, 7 tables)

This paper contains 64 sections, 4 theorems, 7 equations, 30 figures, 7 tables.

Key Result

Theorem 1

Let $z\sim \mathcal{N}(\mu_{z}, \Sigma_{z})$ be a Gaussian variable. Then, we have: $\mathbb{E} \left [ \mu_{z}|z\right ]=z+\Sigma_{z} \nabla_{z}\log(p(z))$

Figures (30)

  • Figure 1: Overview of different topological structures. From the sets and graphs to the combinatorial complex, this figure presents the hierarchy of some topological structures in function of how they incorporate higher-order relations as part of their definitions. Combinatorial complexes generalize all these objects as they both have part-whole relations and set-type relations hajij_topological_2023. The figure has been adapted from Papillon et al. papillon2023architectures et Hajij et al. hajij_topological_2023
  • Figure 2: Overview on the ring-based lifting procedure. We start from the graph representation of a molecule, here a 1-naphthaleneacetic acid. Once the nodes belonging to a ring are identified, we group them to form a rank-2 cell that is added to create a combinatorial complex.
  • Figure 3: Overview on the path-based lifting procedure. We start from the graph representation of a molecule, here an adelphan acid (more precisely, Reserpine). We start with one or many source node(s) and a path length $k\geq 1$. We identify the nodes belonging to the same paths of length $k$ in the graphs and that start with a node that belongs to the set of source nodes. We group them together to form a rank-2 cell that is added to create a combinatorial complex.
  • Figure 4: Molecule with the longest ring in the ZINC250k dataset irwin_zinc_2012. The molecule has a ring made of 24 atoms.
  • Figure 5: Overview of CCSD. We can map an original combinatorial complex to a noise distribution (the prior) with an SDE, and reverse this SDE for generative modelling. We can also reverse the associated probability flow ODE, which yields a deterministic process that samples from the same distribution as the SDE. Both the reverse-time SDE and probability flow ODE can be obtained by estimating the partial score functions $\left ( \nabla_{\Omega_{r,t}} \log \left ( p_{t}(CC_{t})\right )\right )_{0\leq r\leq R}$. The image of a diffusion background has been adapted from song2021scorebased.
  • ...and 25 more figures

Theorems & Definitions (60)

  • Theorem 1: Tweedie's formula
  • Definition 1: Diffusion process
  • Definition 2: Standard Wiener Process
  • Definition 3: Neighborhood function
  • Definition 4: Neighborhood topology
  • Definition 5: Topological space
  • Definition 6: Undirected Graph
  • Definition 7: Hypergraph
  • Definition 8: Simplicial complex
  • Remark
  • ...and 50 more