Table of Contents
Fetching ...

DLGNet: Hyperedge Classification through Directed Line Graphs for Chemical Reactions

Stefano Fiorini, Giulia M. Bovolenta, Stefano Coniglio, Michele Ciavotta, Pietro Morerio, Michele Parrinello, Alessio Del Bue

TL;DR

DLGNet addresses reaction classification by modeling reactions as directed hyperedges and performing spectral convolution on hyperedge features via the Directed Line Graph Laplacian $\oldvec{L}_N$, a Hermitian, positive semidefinite operator. The method constructs a Directed Line Graph $DLG(\\vec{H})$, derives the associated Laplacian, and embeds this into a complex-valued GNN that operates on hyperedges, with an unwind step to real-valued outputs for classification. Empirically, DLGNet attains strong gains over 12 baselines across three real-world chemical-reaction datasets, with an average relative improvement of $33.01\%$ and up to $37.71\%$ on Dataset-3, and ablation studies confirming the necessity of directionality and the Laplacian design. This framework advances hypergraph learning for chemistry by enabling principled, direction-aware, spectral processing of higher-order interactions, with potential applications in retrosynthesis and reaction discovery.

Abstract

Graphs and hypergraphs provide powerful abstractions for modeling interactions among a set of entities of interest and have been attracting a growing interest in the literature thanks to many successful applications in several fields. In particular, they are rapidly expanding in domains such as chemistry and biology, especially in the areas of drug discovery and molecule generation. One of the areas witnessing the fasted growth is the chemical reactions field, where chemical reactions can be naturally encoded as directed hyperedges of a hypergraph. In this paper, we address the chemical reaction classification problem by introducing the notation of a Directed Line Graph (DGL) associated with a given directed hypergraph. On top of it, we build the Directed Line Graph Network (DLGNet), the first spectral-based Graph Neural Network (GNN) expressly designed to operate on a hypergraph via its DLG transformation. The foundation of DLGNet is a novel Hermitian matrix, the Directed Line Graph Laplacian, which compactly encodes the directionality of the interactions taking place within the directed hyperedges of the hypergraph thanks to the DLG representation. The Directed Line Graph Laplacian enjoys many desirable properties, including admitting an eigenvalue decomposition and being positive semidefinite, which make it well-suited for its adoption within a spectral-based GNN. Through extensive experiments on chemical reaction datasets, we show that DGLNet significantly outperforms the existing approaches, achieving on a collection of real-world datasets an average relative-percentage-difference improvement of 33.01%, with a maximum improvement of 37.71%.

DLGNet: Hyperedge Classification through Directed Line Graphs for Chemical Reactions

TL;DR

DLGNet addresses reaction classification by modeling reactions as directed hyperedges and performing spectral convolution on hyperedge features via the Directed Line Graph Laplacian , a Hermitian, positive semidefinite operator. The method constructs a Directed Line Graph , derives the associated Laplacian, and embeds this into a complex-valued GNN that operates on hyperedges, with an unwind step to real-valued outputs for classification. Empirically, DLGNet attains strong gains over 12 baselines across three real-world chemical-reaction datasets, with an average relative improvement of and up to on Dataset-3, and ablation studies confirming the necessity of directionality and the Laplacian design. This framework advances hypergraph learning for chemistry by enabling principled, direction-aware, spectral processing of higher-order interactions, with potential applications in retrosynthesis and reaction discovery.

Abstract

Graphs and hypergraphs provide powerful abstractions for modeling interactions among a set of entities of interest and have been attracting a growing interest in the literature thanks to many successful applications in several fields. In particular, they are rapidly expanding in domains such as chemistry and biology, especially in the areas of drug discovery and molecule generation. One of the areas witnessing the fasted growth is the chemical reactions field, where chemical reactions can be naturally encoded as directed hyperedges of a hypergraph. In this paper, we address the chemical reaction classification problem by introducing the notation of a Directed Line Graph (DGL) associated with a given directed hypergraph. On top of it, we build the Directed Line Graph Network (DLGNet), the first spectral-based Graph Neural Network (GNN) expressly designed to operate on a hypergraph via its DLG transformation. The foundation of DLGNet is a novel Hermitian matrix, the Directed Line Graph Laplacian, which compactly encodes the directionality of the interactions taking place within the directed hyperedges of the hypergraph thanks to the DLG representation. The Directed Line Graph Laplacian enjoys many desirable properties, including admitting an eigenvalue decomposition and being positive semidefinite, which make it well-suited for its adoption within a spectral-based GNN. Through extensive experiments on chemical reaction datasets, we show that DGLNet significantly outperforms the existing approaches, achieving on a collection of real-world datasets an average relative-percentage-difference improvement of 33.01%, with a maximum improvement of 37.71%.

Paper Structure

This paper contains 24 sections, 6 theorems, 25 equations, 6 figures, 5 tables.

Key Result

Theorem 1

If $\vec{H}$ is undirected (i.e., $\vec{H} = H$), $\mathbb{\vec{L}}_N = \mathbb{L}_N$ and $\mathbb{\vec{Q}}_{N} = \mathbb{Q}_{N}$ holds.

Figures (6)

  • Figure 1: Transformation from the directed hypergraph (left) to the directed line graph (right). The hyperedges of $\vec{H}$ become the nodes of DLG$(\vec{H})$m and are connected if they overlap in $\vec{H}$. Complex-valued edge weights in DLG$(\vec{H})$ encode $\vec{H}$'s directionality, as detailed in Section \ref{['sec:method']}.
  • Figure 2: (Upper panel, left): example from Dataset-1. C--C bond formation via reaction of alkyne with alkyl halide; only bi-molecular reactant and main product are taken into account (any byproduct is omitted). (Upper panel, right): example from Dataset-2. C--N bond formation via Buchwald-Hartwig amination; apart from bi-molecular reactant (amine and aryl halide) and main product, catalyst (palladium compound), solvent (dioxane) and base (sodium tert-butoxide) structures are also present. Chemical elements: carbon (C), nitrogen (N), oxygen (O), hydrogen (H), chlorine (Cl), iodine (I), sodium (Na), phosphorus (P) and palladium (Pd). Single, double and triple black lines: bonds between C atoms. H, T: Head and Tail of the directed hypergraph. (Lower panel): schematic representation of Dataset-3 elements. Left side: reactants; right side: competitive outcomes between bimolecular nucleophilic substitution (S$\mathrm{_N}$2) or bimolecular elimination (E2). Thus, each element is composed either of a bi-molecular reactant and a bi-molecular product (S$\mathrm{_N}$2 class), or a bi-molecular reactant and a tri-molecular product (E2 class). X and Y: leaving group and nucleophile agent. Groups A-D: different substituents attached to the alkane carbon backbone (black).
  • Figure 3: Ball-and-stick 3D model of Dataset-1 mislabeled pairs of reaction classes. Color code: grey for carbon, red for oxygen, blue for nitrogen, purple for iodine, green for chlorine, light green for fluorine, brown for bromide, and white for hydrogen. (Left panel, upper): Reduction from a ester to a alcohol substituent on a 6-carbon atoms ring. (Left panel, lower): Functional group interconversion from carboxyl to carbonyl group in the analog hexagonal structure. (Right panel, upper): arylation reaction between a amine compound and a aryl halide, yielding a C--N bond in the final product. (Right panel, lower): heterocycle formation via amide intramolecular condensation, producing a hexagonal ring containing a heteroatom (nitrogen).
  • Figure 4: Dataset-1 confusion matrix.
  • Figure 5: Dataset-2 confusion matrix.
  • ...and 1 more figures

Theorems & Definitions (11)

  • Definition 1
  • Theorem 1
  • Theorem 2
  • Corollary 1
  • Theorem 3
  • Corollary 2
  • Proposition 1
  • proof
  • proof
  • proof
  • ...and 1 more