Table of Contents
Fetching ...

Parameter-efficient Multi-Task and Multi-Domain Learning using Factorized Tensor Networks

Yash Garg, Nebiyou Yismaw, Rakib Hyder, Ashley Prater-Bennette, M. Salman Asif

TL;DR

This paper tackles parameter efficiency in multi-task and multi-domain learning by introducing Factorized Tensor Networks (FTN), which add task/domain-specific low-rank tensor updates to a frozen shared backbone. It formalizes weight updates as $\mathcal{W}_t = \mathcal{W}_{\text{shared}} + \Delta \mathcal{W}_t$ with $\Delta \mathcal{W}_{l,t} = \sum_{r=1}^R \mathbf{w}^{r}_{1,t} \otimes \mathbf{w}^{r}_{2,t} \otimes \mathbf{w}^{r}_{3,t}$ and learns task-specific Batch Normalization, enabling incremental adaptation without forgetting. FTN is demonstrated on both convolutional and transformer backbones across diverse datasets (ImageNet-to-Sketch, DomainNet, NYUD, Visual Decathlon), achieving accuracy comparable to single-task/domain baselines while using a fraction of the additional parameters, and showing favorable training efficiency. The work offers a simple, architecture-agnostic plug-in module for scalable MDL/MTL, with rank controllability enabling per-task complexity and potential extensions to per-layer adaptive ranks and branched architectures for reduced latency.

Abstract

Multi-task and multi-domain learning methods seek to learn multiple tasks/domains, jointly or one after another, using a single unified network. The primary challenge and opportunity lie in leveraging shared information across these tasks and domains to enhance the efficiency of the unified network. The efficiency can be in terms of accuracy, storage cost, computation, or sample complexity. In this paper, we introduce a factorized tensor network (FTN) designed to achieve accuracy comparable to that of independent single-task or single-domain networks, while introducing a minimal number of additional parameters. The FTN approach entails incorporating task- or domain-specific low-rank tensor factors into a shared frozen network derived from a source model. This strategy allows for adaptation to numerous target domains and tasks without encountering catastrophic forgetting. Furthermore, FTN requires a significantly smaller number of task-specific parameters compared to existing methods. We performed experiments on widely used multi-domain and multi-task datasets. We show the experiments on convolutional-based architecture with different backbones and on transformer-based architecture. Our findings indicate that FTN attains similar accuracy as single-task or single-domain methods while using only a fraction of additional parameters per task. The code is available at https://doi.org/10.24433/CO.7519211.v2.

Parameter-efficient Multi-Task and Multi-Domain Learning using Factorized Tensor Networks

TL;DR

This paper tackles parameter efficiency in multi-task and multi-domain learning by introducing Factorized Tensor Networks (FTN), which add task/domain-specific low-rank tensor updates to a frozen shared backbone. It formalizes weight updates as with and learns task-specific Batch Normalization, enabling incremental adaptation without forgetting. FTN is demonstrated on both convolutional and transformer backbones across diverse datasets (ImageNet-to-Sketch, DomainNet, NYUD, Visual Decathlon), achieving accuracy comparable to single-task/domain baselines while using a fraction of the additional parameters, and showing favorable training efficiency. The work offers a simple, architecture-agnostic plug-in module for scalable MDL/MTL, with rank controllability enabling per-task complexity and potential extensions to per-layer adaptive ranks and branched architectures for reduced latency.

Abstract

Multi-task and multi-domain learning methods seek to learn multiple tasks/domains, jointly or one after another, using a single unified network. The primary challenge and opportunity lie in leveraging shared information across these tasks and domains to enhance the efficiency of the unified network. The efficiency can be in terms of accuracy, storage cost, computation, or sample complexity. In this paper, we introduce a factorized tensor network (FTN) designed to achieve accuracy comparable to that of independent single-task or single-domain networks, while introducing a minimal number of additional parameters. The FTN approach entails incorporating task- or domain-specific low-rank tensor factors into a shared frozen network derived from a source model. This strategy allows for adaptation to numerous target domains and tasks without encountering catastrophic forgetting. Furthermore, FTN requires a significantly smaller number of task-specific parameters compared to existing methods. We performed experiments on widely used multi-domain and multi-task datasets. We show the experiments on convolutional-based architecture with different backbones and on transformer-based architecture. Our findings indicate that FTN attains similar accuracy as single-task or single-domain methods while using only a fraction of additional parameters per task. The code is available at https://doi.org/10.24433/CO.7519211.v2.
Paper Structure (21 sections, 7 equations, 8 figures, 13 tables)

This paper contains 21 sections, 7 equations, 8 figures, 13 tables.

Figures (8)

  • Figure 1: Overview of different MTL/MDL approaches and our proposed method. (a) Fine-Tuning trains entire network per task/domain. (b) Feature-Extractor trains a backbone network shared by all tasks/domains with task/domain-specific heads. (c) Our proposed method, Factorized Tensor Network (FTN), adapts to a new task/domain by adding low-rank factors to shared layers. (d) Detailed overview of FTN. A single network adapted to three downstream vision tasks (segmentation, depth, and surface normal estimation) by adding task-specific low-rank tensors ($\Delta \mathcal{W}_t$). Task/domain-specific blocks are shown in same colors.
  • Figure 2: Accuracy vs Low-ranks: We show the top-1% accuracy against different low-ranks used in our method for different domains. We start with ‘only BN’ setup where without any low-rank we keep the Batch Normalization layers as task-specific. Then we show the performance improvement through our approach upon increasing the rank-R.
  • Figure S3: Performance on five domains of the Imagenet-to-sketch dataset as we remove the low-rank parameters. We selected the number of layers in the backbone based on a moving threshold. We annotate the specified threshold at each marker point and the number of affected layers (in parentheses).
  • Figure S4: Norm of low-rank factors in the adapted backbone layers for different domains of the Imagenet-to-sketch dataset with $R=50$.
  • Figure S5: Norm of low-rank factors in the adapted backbone layers for different values of $R\in \{1,5,10,15,20,25,50\}$ with the wikiart domain of the Imagenet-to-sketch dataset.
  • ...and 3 more figures