D-GRIL: End-to-End Topological Learning with 2-parameter Persistence

Soham Mukherjee; Shreyas N. Samaga; Cheng Xin; Steve Oudot; Tamal K. Dey

D-GRIL: End-to-End Topological Learning with 2-parameter Persistence

Soham Mukherjee, Shreyas N. Samaga, Cheng Xin, Steve Oudot, Tamal K. Dey

TL;DR

This work extends end-to-end topological learning from $1$-parameter to $2$-parameter persistence by introducing D-Gril, a differentiable layer that learns a bifiltration using the Gril vectorization of $2$-parameter persistence modules. It establishes that Gril is piecewise affine with an explicit differential on top-dimensional strata and proves convergence of stochastic sub-gradient descent when composed with definable losses, enabling backpropagation through Gril. The framework is demonstrated on graph datasets and bio-activity prediction tasks, where learned bifiltrations lead to competitive or improved performance and faster training compared to existing multiparameter methods. Overall, learning filtration functions end-to-end with Gril yields richer topological representations that enhance graph-based predictive tasks while maintaining tractable optimization dynamics.

Abstract

End-to-end topological learning using 1-parameter persistence is well-known. We show that the framework can be enhanced using 2-parameter persistence by adopting a recently introduced 2-parameter persistence based vectorization technique called GRIL. We establish a theoretical foundation of differentiating GRIL producing D-GRIL. We show that D-GRIL can be used to learn a bifiltration function on standard benchmark graph datasets. Further, we exhibit that this framework can be applied in the context of bio-activity prediction in drug discovery.

D-GRIL: End-to-End Topological Learning with 2-parameter Persistence

TL;DR

This work extends end-to-end topological learning from

-parameter to

-parameter persistence by introducing D-Gril, a differentiable layer that learns a bifiltration using the Gril vectorization of

-parameter persistence modules. It establishes that Gril is piecewise affine with an explicit differential on top-dimensional strata and proves convergence of stochastic sub-gradient descent when composed with definable losses, enabling backpropagation through Gril. The framework is demonstrated on graph datasets and bio-activity prediction tasks, where learned bifiltrations lead to competitive or improved performance and faster training compared to existing multiparameter methods. Overall, learning filtration functions end-to-end with Gril yields richer topological representations that enhance graph-based predictive tasks while maintaining tractable optimization dynamics.

Abstract

Paper Structure (20 sections, 7 theorems, 14 equations, 4 figures, 6 tables)

This paper contains 20 sections, 7 theorems, 14 equations, 4 figures, 6 tables.

Introduction
Overview
Background
Multiparameter persistent homology
Stochastic sub-gradient descent
o-minimal geometry
Differentiability of Gril
Gril as a piecewise affine map
Stochastic sub-gradient descent
Differential of Gril
Practical Considerations
Experiments
Experimental Setup
Bio-activity Prediction Datasets
Benchmark Graph Datasets
...and 5 more sections

Key Result

Proposition 3.5

Let $f\colon \mathbb{R}^d \to \mathbb{R}$ be a locally Lipschitz function that is $C^d$-stratifiable. Consider the iterates $\{ \mathbf{x}_k\}_{k\geq 1}$ produced by the stochastic sub-gradient method (Eq eq:stoc_subgrad) and suppose Assumption C of davis_stochastic holds. Then, almost surely, every

Figures (4)

Figure 1: A 2-worm with lower boundary colored in blue and upper boundary colored in red. The figure also shows the possible cases of constraining simplex coordinates; $\sigma_1$ is a case of lower $x$-constraining simplex coordinate because the $x$-coordinate $\sigma_1^x$ constrains the lower boundary of the worm and prevents the worm from expanding further to the left; $\sigma_2$ is an example of lower $y$-constraining simplex coordinate because $\sigma_2^y$ also constrains the lower boundary of the worm and prevents the worm from expanding downwards. Similarly, $\sigma_3$ and $\sigma_4$ are upper $x$-constraining and upper $y$-constraining respectively. The arrows depict the gradient directions as described in Theorem \ref{['thm:gril_diff']}.
Figure 2: An intuitive understanding of the gradient assignment described in Theorem \ref{['thm:gril_diff']}. The $y$-coordinate $\sigma^y$ is at a distance of $d$ from the $y$-coordinate of $\mathbf{p}$ in the left figure; $\sigma$ is the only constraining simplex for the $2$-worm. In the figure on the right, $\sigma^y$ has reduced by $\epsilon$ and is now at a distance $d+\epsilon$ from $\mathbf{p}^y$. As a consequence, the value of Gril increases from $d$ to $d + \epsilon$. Thus, $\frac{\partial \Lambda^{\mathbf{p}}_{k,\ell}(\mathbf{v}_f)}{\partial \sigma^y} = -1$.
Figure 3: Architecture choice for bio-activity prediction; the bifiltration fuction $f$ is learnt compared to the standard multiparameter pipeline; $\oplus$ denotes concatenation of vectors.
Figure 4: The figure compares the learnt bifiltration function with the Heat-Kernel Signature-Ricci Curvature (HKS-RC) bifiltration on two randomly selected graph instances (838 and 219) of Proteins dataset. These two instances have different labels, 1 and 0 respectively. In the first column, bifiltration function on the vertices of these graphs are plotted. We can see that the learnt bifiltration function is very different from the HKS-RC bifiltration. In the second and third column, Gril vectors are shown using a heatmap for $H_0$ and $H_1$ respectively. We can observe that these signatures are very different in nature. This provides some evidence that the model is learning a totally different bifiltration function as compared to HKS-RC, which is one of the common choices for bifiltration function on graphs.

Theorems & Definitions (22)

Definition 3.1: Bifiltration
Definition 3.2: 2-parameter persistence module
Definition 3.3: Generalized Rank
Remark 3.4
Proposition 3.5: Corollary 5.9 davis_stochastic
Definition 3.6: o-minimal structure
Definition 4.1: discrete $\ell$-worm, gril23
Definition 4.2: Gril, gril23
Proposition 4.3
Proposition 4.4
...and 12 more

D-GRIL: End-to-End Topological Learning with 2-parameter Persistence

TL;DR

Abstract

D-GRIL: End-to-End Topological Learning with 2-parameter Persistence

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (22)