Parameter-efficient Multi-Task and Multi-Domain Learning using Factorized Tensor Networks
Yash Garg, Nebiyou Yismaw, Rakib Hyder, Ashley Prater-Bennette, M. Salman Asif
TL;DR
This paper tackles parameter efficiency in multi-task and multi-domain learning by introducing Factorized Tensor Networks (FTN), which add task/domain-specific low-rank tensor updates to a frozen shared backbone. It formalizes weight updates as $\mathcal{W}_t = \mathcal{W}_{\text{shared}} + \Delta \mathcal{W}_t$ with $\Delta \mathcal{W}_{l,t} = \sum_{r=1}^R \mathbf{w}^{r}_{1,t} \otimes \mathbf{w}^{r}_{2,t} \otimes \mathbf{w}^{r}_{3,t}$ and learns task-specific Batch Normalization, enabling incremental adaptation without forgetting. FTN is demonstrated on both convolutional and transformer backbones across diverse datasets (ImageNet-to-Sketch, DomainNet, NYUD, Visual Decathlon), achieving accuracy comparable to single-task/domain baselines while using a fraction of the additional parameters, and showing favorable training efficiency. The work offers a simple, architecture-agnostic plug-in module for scalable MDL/MTL, with rank controllability enabling per-task complexity and potential extensions to per-layer adaptive ranks and branched architectures for reduced latency.
Abstract
Multi-task and multi-domain learning methods seek to learn multiple tasks/domains, jointly or one after another, using a single unified network. The primary challenge and opportunity lie in leveraging shared information across these tasks and domains to enhance the efficiency of the unified network. The efficiency can be in terms of accuracy, storage cost, computation, or sample complexity. In this paper, we introduce a factorized tensor network (FTN) designed to achieve accuracy comparable to that of independent single-task or single-domain networks, while introducing a minimal number of additional parameters. The FTN approach entails incorporating task- or domain-specific low-rank tensor factors into a shared frozen network derived from a source model. This strategy allows for adaptation to numerous target domains and tasks without encountering catastrophic forgetting. Furthermore, FTN requires a significantly smaller number of task-specific parameters compared to existing methods. We performed experiments on widely used multi-domain and multi-task datasets. We show the experiments on convolutional-based architecture with different backbones and on transformer-based architecture. Our findings indicate that FTN attains similar accuracy as single-task or single-domain methods while using only a fraction of additional parameters per task. The code is available at https://doi.org/10.24433/CO.7519211.v2.
