Exploring Correlations of Self-Supervised Tasks for Graphs
Taoran Fang, Wei Zhou, Yifei Sun, Kaiqiao Han, Lvbin Ma, Yang Yang
TL;DR
The work addresses the paucity of understanding around correlations among graph self-supervised tasks and the universality of learned representations. It introduces Cor$(t_1,t_2)$, ATD, and ARL to quantify cross-task expressiveness and difficulty, revealing that correlations are dataset-specific and that naïve multi-task training often fails to yield universally strong representations. To tackle this, the authors propose GraphTCM, a correlation-modeling module that predicts inter-task correlations from task representations via a learned exponential attention mechanism and congruent loss, enabling training of representations with high cross-task capability. Empirically, GraphTCM not only reconstructs observed correlations with low error but also yields representations that perform best on downstream node classification and link prediction across multiple datasets, significantly outperforming baseline multi-task and mixing strategies. The approach offers a principled way to model task relationships in graph SSL and demonstrates practical gains in robustness and generalization.
Abstract
Graph self-supervised learning has sparked a research surge in training informative representations without accessing any labeled data. However, our understanding of graph self-supervised learning remains limited, and the inherent relationships between various self-supervised tasks are still unexplored. Our paper aims to provide a fresh understanding of graph self-supervised learning based on task correlations. Specifically, we evaluate the performance of the representations trained by one specific task on other tasks and define correlation values to quantify task correlations. Through this process, we unveil the task correlations between various self-supervised tasks and can measure their expressive capabilities, which are closely related to downstream performance. By analyzing the correlation values between tasks across various datasets, we reveal the complexity of task correlations and the limitations of existing multi-task learning methods. To obtain more capable representations, we propose Graph Task Correlation Modeling (GraphTCM) to illustrate the task correlations and utilize it to enhance graph self-supervised training. The experimental results indicate that our method significantly outperforms existing methods across various downstream tasks.
