CellCLAT: Preserving Topology and Trimming Redundancy in Self-Supervised Cellular Contrastive Learning
Bin Qin, Qirui Ji, Jiangmeng Li, Yupeng Wang, Xuesong Wu, Jianwen Cao, Fanjiang Xu
TL;DR
CellCLAT tackles the challenge of self-supervised learning in cellular topological deep learning by preserving higher-order topology and removing redundant topological information. It combines a topology-preserving, parameter-perturbed augmentation with a bi-level meta-learning framework that adaptively trims 2-cells, yielding refined representations. Theoretical results show CCNNs are strictly more expressive than the $3$-WL test and that topological redundancy acts as a confounder, addressed via do-calculus-inspired adjustments. Empirically, CellCLAT achieves state-of-the-art performance on six TU datasets in both unsupervised and semi-supervised settings, with strong ablations supporting the utility of 2-cell trimming and topology-preserving augmentation.
Abstract
Self-supervised topological deep learning (TDL) represents a nascent but underexplored area with significant potential for modeling higher-order interactions in simplicial complexes and cellular complexes to derive representations of unlabeled graphs. Compared to simplicial complexes, cellular complexes exhibit greater expressive power. However, the advancement in self-supervised learning for cellular TDL is largely hindered by two core challenges: \textit{extrinsic structural constraints} inherent to cellular complexes, and intrinsic semantic redundancy in cellular representations. The first challenge highlights that traditional graph augmentation techniques may compromise the integrity of higher-order cellular interactions, while the second underscores that topological redundancy in cellular complexes potentially diminish task-relevant information. To address these issues, we introduce Cellular Complex Contrastive Learning with Adaptive Trimming (CellCLAT), a twofold framework designed to adhere to the combinatorial constraints of cellular complexes while mitigating informational redundancy. Specifically, we propose a parameter perturbation-based augmentation method that injects controlled noise into cellular interactions without altering the underlying cellular structures, thereby preserving cellular topology during contrastive learning. Additionally, a cellular trimming scheduler is employed to mask gradient contributions from task-irrelevant cells through a bi-level meta-learning approach, effectively removing redundant topological elements while maintaining critical higher-order semantics. We provide theoretical justification and empirical validation to demonstrate that CellCLAT achieves substantial improvements over existing self-supervised graph learning methods, marking a significant attempt in this domain.
