Table of Contents
Fetching ...

CellCLAT: Preserving Topology and Trimming Redundancy in Self-Supervised Cellular Contrastive Learning

Bin Qin, Qirui Ji, Jiangmeng Li, Yupeng Wang, Xuesong Wu, Jianwen Cao, Fanjiang Xu

TL;DR

CellCLAT tackles the challenge of self-supervised learning in cellular topological deep learning by preserving higher-order topology and removing redundant topological information. It combines a topology-preserving, parameter-perturbed augmentation with a bi-level meta-learning framework that adaptively trims 2-cells, yielding refined representations. Theoretical results show CCNNs are strictly more expressive than the $3$-WL test and that topological redundancy acts as a confounder, addressed via do-calculus-inspired adjustments. Empirically, CellCLAT achieves state-of-the-art performance on six TU datasets in both unsupervised and semi-supervised settings, with strong ablations supporting the utility of 2-cell trimming and topology-preserving augmentation.

Abstract

Self-supervised topological deep learning (TDL) represents a nascent but underexplored area with significant potential for modeling higher-order interactions in simplicial complexes and cellular complexes to derive representations of unlabeled graphs. Compared to simplicial complexes, cellular complexes exhibit greater expressive power. However, the advancement in self-supervised learning for cellular TDL is largely hindered by two core challenges: \textit{extrinsic structural constraints} inherent to cellular complexes, and intrinsic semantic redundancy in cellular representations. The first challenge highlights that traditional graph augmentation techniques may compromise the integrity of higher-order cellular interactions, while the second underscores that topological redundancy in cellular complexes potentially diminish task-relevant information. To address these issues, we introduce Cellular Complex Contrastive Learning with Adaptive Trimming (CellCLAT), a twofold framework designed to adhere to the combinatorial constraints of cellular complexes while mitigating informational redundancy. Specifically, we propose a parameter perturbation-based augmentation method that injects controlled noise into cellular interactions without altering the underlying cellular structures, thereby preserving cellular topology during contrastive learning. Additionally, a cellular trimming scheduler is employed to mask gradient contributions from task-irrelevant cells through a bi-level meta-learning approach, effectively removing redundant topological elements while maintaining critical higher-order semantics. We provide theoretical justification and empirical validation to demonstrate that CellCLAT achieves substantial improvements over existing self-supervised graph learning methods, marking a significant attempt in this domain.

CellCLAT: Preserving Topology and Trimming Redundancy in Self-Supervised Cellular Contrastive Learning

TL;DR

CellCLAT tackles the challenge of self-supervised learning in cellular topological deep learning by preserving higher-order topology and removing redundant topological information. It combines a topology-preserving, parameter-perturbed augmentation with a bi-level meta-learning framework that adaptively trims 2-cells, yielding refined representations. Theoretical results show CCNNs are strictly more expressive than the -WL test and that topological redundancy acts as a confounder, addressed via do-calculus-inspired adjustments. Empirically, CellCLAT achieves state-of-the-art performance on six TU datasets in both unsupervised and semi-supervised settings, with strong ablations supporting the utility of 2-cell trimming and topology-preserving augmentation.

Abstract

Self-supervised topological deep learning (TDL) represents a nascent but underexplored area with significant potential for modeling higher-order interactions in simplicial complexes and cellular complexes to derive representations of unlabeled graphs. Compared to simplicial complexes, cellular complexes exhibit greater expressive power. However, the advancement in self-supervised learning for cellular TDL is largely hindered by two core challenges: \textit{extrinsic structural constraints} inherent to cellular complexes, and intrinsic semantic redundancy in cellular representations. The first challenge highlights that traditional graph augmentation techniques may compromise the integrity of higher-order cellular interactions, while the second underscores that topological redundancy in cellular complexes potentially diminish task-relevant information. To address these issues, we introduce Cellular Complex Contrastive Learning with Adaptive Trimming (CellCLAT), a twofold framework designed to adhere to the combinatorial constraints of cellular complexes while mitigating informational redundancy. Specifically, we propose a parameter perturbation-based augmentation method that injects controlled noise into cellular interactions without altering the underlying cellular structures, thereby preserving cellular topology during contrastive learning. Additionally, a cellular trimming scheduler is employed to mask gradient contributions from task-irrelevant cells through a bi-level meta-learning approach, effectively removing redundant topological elements while maintaining critical higher-order semantics. We provide theoretical justification and empirical validation to demonstrate that CellCLAT achieves substantial improvements over existing self-supervised graph learning methods, marking a significant attempt in this domain.

Paper Structure

This paper contains 26 sections, 3 theorems, 20 equations, 9 figures, 6 tables.

Key Result

Theorem 1

Let $f:\mathcal{G} \to \mathcal{X}$ be a skeleton‐preserving gluing process: from graphs to cellular complexes. Let $G_1, G_2$ be graphs such that the 1‐WL test (and hence any GNN that is bounded by WL) cannot distinguish between them, i.e., $\mathrm{c}^{G_1,t} = \mathrm{c}^{G_2,t}$ for all iteratio

Figures (9)

  • Figure 1: Experimental scatter diagrams obtained by randomly trimming 2-cell cellular complex contrastive learning representations on the PROTEINS and IMDB-B datasets. The baseline and the red dashed line indicate the classification accuracy achieved using the complete 2-cell representations. The x-axis values represent a fixed proportion of 2-cell removal, with each proportion interval containing 50 points. Each individual point corresponds to an independent classification result achieved by randomly trimming the original 2-cell representations with a specific trimming ratio.
  • Figure 2: The gluing process of constructing a cellular complex from a graph is achieved through a sequence of continuous attaching maps.
  • Figure 3: The framework of $\text{CellCLAT}$. The blue dashed lines indicate the standard contrastive learning phase, where the encoder $f(\cdot;\Theta)$ and the projection head $g(\cdot;\Phi)$ are updated while keeping the Cellular Trimming Scheduler $\Psi$ fixed. The red dashed lines represent the update process of $\Psi$ through the bi-level optimization process.
  • Figure 4: Hyper-parameter sensitivity analysis.
  • Figure 5: t-SNE visualization of six methods on MUTAG.
  • ...and 4 more figures

Theorems & Definitions (10)

  • Theorem 1: CCNN is strictly more expressive than the WL test
  • Definition 1: Color Refinement on Cellular Complexes
  • Definition 2: Color Equivalence on Cellular Complexes
  • Definition 3: Topological Reduction
  • Lemma 1
  • Theorem 2
  • Definition 4
  • Definition 5
  • Definition 6
  • Definition 7