Table of Contents
Fetching ...

Efficient Task Grouping Through Samplewise Optimisation Landscape Analysis

Anshul Thakur, Yichen Huang, Soheila Molaei, Yujiang Wang, David A. Clifton

TL;DR

The paper tackles negative transfer in multi-task learning and gradient-based meta-learning by introducing a scalable framework that leverages sample-wise optimisation landscape analysis to infer inter-task affinities without training shared models. It then builds a task graph using SCA-derived affinities and uses Graph Attention Networks to capture higher-order interactions, followed by Gaussian Mixture Model clustering to form task groups. The approach yields competitive performance with state-of-the-art task grouping methods while achieving significant computational speedups (approximately five-fold) across multiple datasets. Limitations include potential misalignment between affine task groupings and semantic task similarity, as well as memory considerations for large models; future work suggests coresets to further reduce computation and memory.

Abstract

Shared training approaches, such as multi-task learning (MTL) and gradient-based meta-learning, are widely used in various machine learning applications, but they often suffer from negative transfer, leading to performance degradation in specific tasks. While several optimisation techniques have been developed to mitigate this issue for pre-selected task cohorts, identifying optimal task combinations for joint learning - known as task grouping - remains underexplored and computationally challenging due to the exponential growth in task combinations and the need for extensive training and evaluation cycles. This paper introduces an efficient task grouping framework designed to reduce these overwhelming computational demands of the existing methods. The proposed framework infers pairwise task similarities through a sample-wise optimisation landscape analysis, eliminating the need for the shared model training required to infer task similarities in existing methods. With task similarities acquired, a graph-based clustering algorithm is employed to pinpoint near-optimal task groups, providing an approximate yet efficient and effective solution to the originally NP-hard problem. Empirical assessments conducted on 8 different datasets highlight the effectiveness of the proposed framework, revealing a five-fold speed enhancement compared to previous state-of-the-art methods. Moreover, the framework consistently demonstrates comparable performance, confirming its remarkable efficiency and effectiveness in task grouping.

Efficient Task Grouping Through Samplewise Optimisation Landscape Analysis

TL;DR

The paper tackles negative transfer in multi-task learning and gradient-based meta-learning by introducing a scalable framework that leverages sample-wise optimisation landscape analysis to infer inter-task affinities without training shared models. It then builds a task graph using SCA-derived affinities and uses Graph Attention Networks to capture higher-order interactions, followed by Gaussian Mixture Model clustering to form task groups. The approach yields competitive performance with state-of-the-art task grouping methods while achieving significant computational speedups (approximately five-fold) across multiple datasets. Limitations include potential misalignment between affine task groupings and semantic task similarity, as well as memory considerations for large models; future work suggests coresets to further reduce computation and memory.

Abstract

Shared training approaches, such as multi-task learning (MTL) and gradient-based meta-learning, are widely used in various machine learning applications, but they often suffer from negative transfer, leading to performance degradation in specific tasks. While several optimisation techniques have been developed to mitigate this issue for pre-selected task cohorts, identifying optimal task combinations for joint learning - known as task grouping - remains underexplored and computationally challenging due to the exponential growth in task combinations and the need for extensive training and evaluation cycles. This paper introduces an efficient task grouping framework designed to reduce these overwhelming computational demands of the existing methods. The proposed framework infers pairwise task similarities through a sample-wise optimisation landscape analysis, eliminating the need for the shared model training required to infer task similarities in existing methods. With task similarities acquired, a graph-based clustering algorithm is employed to pinpoint near-optimal task groups, providing an approximate yet efficient and effective solution to the originally NP-hard problem. Empirical assessments conducted on 8 different datasets highlight the effectiveness of the proposed framework, revealing a five-fold speed enhancement compared to previous state-of-the-art methods. Moreover, the framework consistently demonstrates comparable performance, confirming its remarkable efficiency and effectiveness in task grouping.

Paper Structure

This paper contains 20 sections, 2 theorems, 19 equations, 12 figures, 7 tables.

Key Result

Theorem 4.1

Let $\left(\mathbf{x}_i,\{y_i^t\}_{t=1}^T\right)$ be a sample for updating the shared parameters, initialised with $\mathbf{\theta}_0$, in multi-tasking setup. Then, sample-wise gradient to update $\mathbf{\theta}_0$ is average of the task-specific sample-wise gradients i.e. $\frac{1}{T} \sum_{t=1}^

Figures (12)

  • Figure 1: Illustration of a standard task grouping framework addressing three patient-care prediction tasks. A single multi-task model is replaced with two models where each model is trained for the identified task groupings. Contrary to this illustration, some task grouping frameworks doesn't allow singleton task groups.
  • Figure 2: Illustration of the nature of shared global optima $\theta^{\star}$ as a function of the density of task-specific sample-wise local optima, $\theta_1^{\star}$ and $\theta_2^{\star}$ for tasks 1 and 2, respectively. In comparison to sparser local optima (a), denser local optima (b) results in $\theta^{\star}$ that leads to better average loss across both tasks, $\mathcal{J}=(\ell_1+\ell_2)/2$, as desired in multi-tasking and other information sharing frameworks.
  • Figure 3: (a) t-SNE representations of sample-wise converged Resnet-50 models for CIFAR-10, street view house number (SVHN) and colorectal histology datasets. (b) SCA task affinities computed based on the density of task-specific sample-wise optima.
  • Figure 4: An illustration of higher-order interactions captured by GATs. The representation learned for the $i$-th node depends not only on its immediate neighbours but also on the neighbourhoods of those neighbours.
  • Figure 5: Illustration of the process of obtaining task groupings from pair-wise SCA scores (affinity matrix) using graph attention network (GAT) and Gaussian mixture modelling.
  • ...and 7 more figures

Theorems & Definitions (4)

  • Theorem 4.1
  • proof
  • Theorem 4.2
  • proof