Table of Contents
Fetching ...

Boosting Multitask Learning on Graphs through Higher-Order Task Affinities

Dongyue Li, Haotian Ju, Aneesh Sharma, Hongyang R. Zhang

TL;DR

This paper considers predicting multiple node labeling functions on graphs simultaneously and revisits this problem from a multitask learning perspective, developing an algorithm to cluster tasks into groups based on a higher-order task affinity measure.

Abstract

Predicting node labels on a given graph is a widely studied problem with many applications, including community detection and molecular graph prediction. This paper considers predicting multiple node labeling functions on graphs simultaneously and revisits this problem from a multitask learning perspective. For a concrete example, consider overlapping community detection: each community membership is a binary node classification task. Due to complex overlapping patterns, we find that negative transfer is prevalent when we apply naive multitask learning to multiple community detection, as task relationships are highly nonlinear across different node labeling. To address the challenge, we develop an algorithm to cluster tasks into groups based on a higher-order task affinity measure. We then fit a multitask model on each task group, resulting in a boosting procedure on top of the baseline model. We estimate the higher-order task affinity measure between two tasks as the prediction loss of one task in the presence of another task and a random subset of other tasks. Then, we use spectral clustering on the affinity score matrix to identify task grouping. We design several speedup techniques to compute the higher-order affinity scores efficiently and show that they can predict negative transfers more accurately than pairwise task affinities. We validate our procedure using various community detection and molecular graph prediction data sets, showing favorable results compared with existing methods. Lastly, we provide a theoretical analysis to show that under a planted block model of tasks on graphs, our affinity scores can provably separate tasks into groups.

Boosting Multitask Learning on Graphs through Higher-Order Task Affinities

TL;DR

This paper considers predicting multiple node labeling functions on graphs simultaneously and revisits this problem from a multitask learning perspective, developing an algorithm to cluster tasks into groups based on a higher-order task affinity measure.

Abstract

Predicting node labels on a given graph is a widely studied problem with many applications, including community detection and molecular graph prediction. This paper considers predicting multiple node labeling functions on graphs simultaneously and revisits this problem from a multitask learning perspective. For a concrete example, consider overlapping community detection: each community membership is a binary node classification task. Due to complex overlapping patterns, we find that negative transfer is prevalent when we apply naive multitask learning to multiple community detection, as task relationships are highly nonlinear across different node labeling. To address the challenge, we develop an algorithm to cluster tasks into groups based on a higher-order task affinity measure. We then fit a multitask model on each task group, resulting in a boosting procedure on top of the baseline model. We estimate the higher-order task affinity measure between two tasks as the prediction loss of one task in the presence of another task and a random subset of other tasks. Then, we use spectral clustering on the affinity score matrix to identify task grouping. We design several speedup techniques to compute the higher-order affinity scores efficiently and show that they can predict negative transfers more accurately than pairwise task affinities. We validate our procedure using various community detection and molecular graph prediction data sets, showing favorable results compared with existing methods. Lastly, we provide a theoretical analysis to show that under a planted block model of tasks on graphs, our affinity scores can provably separate tasks into groups.
Paper Structure (26 sections, 2 theorems, 12 equations, 6 figures, 7 tables, 1 algorithm)

This paper contains 26 sections, 2 theorems, 12 equations, 6 figures, 7 tables, 1 algorithm.

Key Result

Lemma 6.1

In the setting described above, let the projection matrix $\tilde{\Sigma}$ be given by $\tilde{\Sigma} = \tilde{P}_{_G} \tilde{X} ( \tilde{X}^\top \tilde{P}_{_G}^\top \tilde{P}_{_G} \tilde{X} )^\dagger \tilde{X}^\top \tilde{P}_{_G}^\top$. For any $1 \le i, j \le T$, we have that the relevance score

Figures (6)

  • Figure 1: Overview of our boosting procedure: (1) We sample random subsets of tasks, each subset containing a fixed number of tasks. (2) For each subset $S_k$, for $k = 1, 2, \dots, n$, we fit a multitask learning (MTL) model on the combined data sets of all tasks in $S_k$, using a graph neural network (GNN) as the shared encoder. After fitting the MTL model, we evaluate its prediction performance for each task $i \in S_k$, denoted as $f_i(S_k)$. (3) We compute an affinity score $\theta_{i, j}$ by averaging task $i$'s scores among all subsets as in equation \ref{['eq_aff']}, where $n_{i, j}$ is the number of subsets including both $i, j$. This results in a $T$ by $T$ affinity matrix, denoted as $[\theta_{i, j}]_{T\times T}$. (4) We apply spectral clustering on this matrix to find clusters of task groups and fit one GNN for each task group.
  • Figure 2: This figure illustrates the widespread negative transfer effect among tasks by noting that MTL performance can dip below STL for four separate (randomly selected) target tasks. We fix a target task $i$ for each plot, then randomly pick ten source tasks (out of $100$) and for each source task $j$ train an MTL with $i$ and $j$; we report the MTL accuracy for $i$ minus $i$'s STL accuracy. Thus, bars above zero indicate positive transfers from source to target tasks, while bars below zero indicate negative transfers.
  • Figure 3: In each subfigure, we visualize the personalized PageRank vectors of a set of nodes in one community. They differ dramatically across non-overlapping communities.
  • Figure 4: (\ref{['fig_model_capacity']}) We show a consistent negative transfer even after increasing the model size (measured by width). (\ref{['fig_mon']}) The $x$-axis refers to the number of added source tasks to train with the target task. The $y$-axis refers to the difference in performance between MTL and STL (with the target task alone). We observe that the MTL performance of a target task starts to decrease after adding two or more source tasks, even though these are all "positive" source tasks (in the sense of pairwise transfer). (\ref{['fig_sub']}) Under the presence of a negatively interfering source task, the benefit of adding more "positive" tasks diminishes, implying that the $f(\cdot)$ function is not submodular.
  • Figure 5: We use task affinity scores from tasks in a subset $S$ to task $i$ to predict whether training with subset $S$ decreases the STL performance of task $i$. Left: Compared with two first-order task affinity scores, our higher-order task affinity scores achieve consistently better F1-score for predicting negative transfers of combining up to $\alpha = 20$. Right: The F1-score for predicting negative transfers converges when the sampled subsets $n$ reach $2000$. Results consistently hold for different subset sizes.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Lemma 6.1
  • Theorem 6.3
  • Claim A.1
  • Claim A.2