Attending to Topological Spaces: The Cellular Transformer

Rubén Ballester; Pablo Hernández-García; Mathilde Papillon; Claudio Battiloro; Nina Miolane; Tolga Birdal; Carles Casacuberta; Sergio Escalera; Mustafa Hajij

Attending to Topological Spaces: The Cellular Transformer

Rubén Ballester, Pablo Hernández-García, Mathilde Papillon, Claudio Battiloro, Nina Miolane, Tolga Birdal, Carles Casacuberta, Sergio Escalera, Mustafa Hajij

TL;DR

The Cellular Transformer (CT) extends transformer attention to $2$-dimensional regular cell complexes, enabling self- and cross-attention across multiple cell ranks via incidence relations and cochain signals. It introduces pairwise and general cellular attention, plus topological positional encodings (BSPe, RWPe, TopoSlepiansPE) to embed structural information, and demonstrates state-of-the-art or competitive performance on datasets lifted to cell complexes without tricks like virtual nodes or rewiring. The work highlights that global position encodings often outperform local ones, and that pairwise attention excels when features are heterogeneous across ranks while general attention suits homogeneous settings. Overall, CT bridges topological deep learning and transformer architectures, opening avenues for richer high-order representations and broader applications with cell-complex data.

Abstract

Topological Deep Learning seeks to enhance the predictive performance of neural network models by harnessing topological structures in input data. Topological neural networks operate on spaces such as cell complexes and hypergraphs, that can be seen as generalizations of graphs. In this work, we introduce the Cellular Transformer (CT), a novel architecture that generalizes graph-based transformers to cell complexes. First, we propose a new formulation of the usual self- and cross-attention mechanisms, tailored to leverage incidence relations in cell complexes, e.g., edge-face and node-edge relations. Additionally, we propose a set of topological positional encodings specifically designed for cell complexes. By transforming three graph datasets into cell complex datasets, our experiments reveal that CT not only achieves state-of-the-art performance, but it does so without the need for more complex enhancements such as virtual nodes, in-domain structural encodings, or graph rewiring.

Attending to Topological Spaces: The Cellular Transformer

TL;DR

The Cellular Transformer (CT) extends transformer attention to

-dimensional regular cell complexes, enabling self- and cross-attention across multiple cell ranks via incidence relations and cochain signals. It introduces pairwise and general cellular attention, plus topological positional encodings (BSPe, RWPe, TopoSlepiansPE) to embed structural information, and demonstrates state-of-the-art or competitive performance on datasets lifted to cell complexes without tricks like virtual nodes or rewiring. The work highlights that global position encodings often outperform local ones, and that pairwise attention excels when features are heterogeneous across ranks while general attention suits homogeneous settings. Overall, CT bridges topological deep learning and transformer architectures, opening avenues for richer high-order representations and broader applications with cell-complex data.

Abstract

Paper Structure (42 sections, 18 equations, 5 figures, 4 tables)

This paper contains 42 sections, 18 equations, 5 figures, 4 tables.

Introduction
Contributions
Related Work
Graph transformers
Higher-order transformers
Non-transformer topological neural networks
Cell Complexes
Data on cell complexes: cochain spaces
The Cellular Transformer
Overview
The cellular attention layer and cellular transformer architecture
Pairwise cellular attention
General cellular attention
Tensor diagrams for cellular transformers
Positional encodings on cellular complexes
...and 27 more sections

Figures (5)

Figure 1: Illustration of an annotated cell complex. Left: An annotated cell complex $\mathcal{X}$ consisting of five vertices, five edges, and one $2$-cell. Center: $\mathcal{X}_{k}$ is the collection of $k$-cells of $\mathcal{X}$ for $k=0,1,2$. Right: Rows depict values of a cochain $\mathbf{X}_k$ for each $k$, of dimensions $d_0=4$, $d_1=3$ and $d_2=2$.
Figure 2: Tensor diagram illustrating the flow of signals between cochains defined on $0$-, $1$-, and $2$-cells. For pairwise attention (\ref{['scn:cell_self_cross_att']}), the neighborhood matrices indicate the bias $\mathbf{N}$ in the attention formula \ref{['formula:attention_self_cross']}. For general attention (\ref{['scn:general_attention']}), neighborhood matrices indicates how to build the bias matrix $\mathbf{N}$ by composition of smaller bias matrices $\mathbf{N}_{k_s\to k_t}$ between dimensions.
Figure 3: Left: A cell complex $\mathcal{X}$. Center: Barycentric subdivision of $\mathcal{X}$. Right: $1$-skeleton of the barycentric subdivision. Each original cell of $\mathcal{X}$ is represented by a node in the $1$-skeleton.
Figure 4: BSPe positional encoding of length three for a cell complex with two $2$-cells. To generate a colour from the positional encoding, we normalize each coordinate of the positional encodings to the $[0,1]$ range, generating normalized RGB colours. Note that close cells are assigned similar colours.
Figure 5: Differences between RWBSPe and RWPe random walks. RWBSPe random walks can jump from a cell to all its incident and coincident cells, while RWPe random walks can jump from a cell to all its upper and lower adjacent cells.

Theorems & Definitions (2)

Example B.1: Large eigenvalues extending LapPE to simplicial complexes using the unnormalized Hodge Laplacian
Example B.2: Rayleigh quotient of the Hodge Laplacian does not produce a gradient of arbitrary dimensional cells

Attending to Topological Spaces: The Cellular Transformer

TL;DR

Abstract

Attending to Topological Spaces: The Cellular Transformer

Authors

TL;DR

Abstract

Table of Contents

Figures (5)

Theorems & Definitions (2)