Table of Contents
Fetching ...

SGW-based Multi-Task Learning in Vision Tasks

Ruiyuan Zhang, Yuyao Chen, Yuchi Huo, Jiaxiang Liu, Dianbing Xi, Jie Liu, Chao Wu

TL;DR

This paper proposes an information bottleneck knowledge extraction module (KEM), which aims to reduce inter-task interference by constraining the flow of information, thereby reducing computational complexity and implemented and conducted comparative experiments on multiple datasets.

Abstract

Multi-task-learning(MTL) is a multi-target optimization task. Neural networks try to realize each target using a shared interpretative space within MTL. However, as the scale of datasets expands and the complexity of tasks increases, knowledge sharing becomes increasingly challenging. In this paper, we first re-examine previous cross-attention MTL methods from the perspective of noise. We theoretically analyze this issue and identify it as a flaw in the cross-attention mechanism. To address this issue, we propose an information bottleneck knowledge extraction module (KEM). This module aims to reduce inter-task interference by constraining the flow of information, thereby reducing computational complexity. Furthermore, we have employed neural collapse to stabilize the knowledge-selection process. That is, before input to KEM, we projected the features into ETF space. This mapping makes our method more robust. We implemented and conducted comparative experiments with this method on multiple datasets. The results demonstrate that our approach significantly outperforms existing methods in multi-task learning.

SGW-based Multi-Task Learning in Vision Tasks

TL;DR

This paper proposes an information bottleneck knowledge extraction module (KEM), which aims to reduce inter-task interference by constraining the flow of information, thereby reducing computational complexity and implemented and conducted comparative experiments on multiple datasets.

Abstract

Multi-task-learning(MTL) is a multi-target optimization task. Neural networks try to realize each target using a shared interpretative space within MTL. However, as the scale of datasets expands and the complexity of tasks increases, knowledge sharing becomes increasingly challenging. In this paper, we first re-examine previous cross-attention MTL methods from the perspective of noise. We theoretically analyze this issue and identify it as a flaw in the cross-attention mechanism. To address this issue, we propose an information bottleneck knowledge extraction module (KEM). This module aims to reduce inter-task interference by constraining the flow of information, thereby reducing computational complexity. Furthermore, we have employed neural collapse to stabilize the knowledge-selection process. That is, before input to KEM, we projected the features into ETF space. This mapping makes our method more robust. We implemented and conducted comparative experiments with this method on multiple datasets. The results demonstrate that our approach significantly outperforms existing methods in multi-task learning.
Paper Structure (19 sections, 14 equations, 5 figures, 7 tables)

This paper contains 19 sections, 14 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Toy experiment to illustrate the concept of noise. For simplicity, we utilized a 2-layer single-head attention mechanism and added standard Gaussian noise to the hidden states to simulate irrelevant information in Multi-Task Learning (MTL).
  • Figure 2: The illustration of Knowledge-extraction module in multi-task learning using Cross Attention Mechanism. (a) Knowledge extraction from the feature spaces of other tasks, showing $\mathcal{O}(2n_s^2 \cdot d_e)$ computational complexity. (b) Attention weights calculation using Softmax, which can be divided into noise and valuable weights, is one reason for inter-task interference.
  • Figure 3: The illustration of the proposed KEM framework. (a) A frozen pre-trained Vision Transformer model is used for initial image encoding. (b) Each task learns its encoder. (c) Features between tasks interact through an information bottleneck memory slots, involving three steps: Retrieve, Write, and Broadcast. See Section \ref{['sec:kem']} for details. (d) Each task learns its decoder to decode data into the task-specific format.
  • Figure 4: NYUD-v2 validation results on semantic segmentation and depth estimation. Red boxes highlight regions of interest, showing the effectiveness of our method and the baseline with cross-attention.
  • Figure 5: PASCAL validation results on human part segmentation, saliency estimation, and semantic segmentation. Red boxes highlight regions of interest, demonstrating the effectiveness of our method and baseline with cross-attention. In blurry areas of the human body, KEM shows superior noise resistance.