AdapMTL: Adaptive Pruning Framework for Multitask Learning Model

Mingcan Xiang; Steven Jiaxun Tang; Qizheng Yang; Hui Guan; Tongping Liu

AdapMTL: Adaptive Pruning Framework for Multitask Learning Model

Mingcan Xiang, Steven Jiaxun Tang, Qizheng Yang, Hui Guan, Tongping Liu

TL;DR

AdapMTL tackles the challenge of efficiently pruning multitask learning (MTL) models by recognizing that the shared backbone and task-specific heads have different sensitivities to pruning. It introduces per-component learnable soft thresholds and an adaptive weighting mechanism, enabling co-optimization of sparsity allocation and model weights from scratch. The method consistently outperforms state-of-the-art pruning baselines on NYU-v2 and Tiny-Taskonomy across backbones, achieving high overall sparsity while maintaining or improving relative task performance, as captured by the unified metric $\\triangle_T$. This approach advances practical, resource-efficient MTL deployments and shows potential for scalable pruning in multimodal and larger architectures like VideoBERT and beyond.

Abstract

In the domain of multimedia and multimodal processing, the efficient handling of diverse data streams such as images, video, and sensor data is paramount. Model compression and multitask learning (MTL) are crucial in this field, offering the potential to address the resource-intensive demands of processing and interpreting multiple forms of media simultaneously. However, effectively compressing a multitask model presents significant challenges due to the complexities of balancing sparsity allocation and accuracy performance across multiple tasks. To tackle these challenges, we propose AdapMTL, an adaptive pruning framework for MTL models. AdapMTL leverages multiple learnable soft thresholds independently assigned to the shared backbone and the task-specific heads to capture the nuances in different components' sensitivity to pruning. During training, it co-optimizes the soft thresholds and MTL model weights to automatically determine the suitable sparsity level at each component to achieve both high task accuracy and high overall sparsity. It further incorporates an adaptive weighting mechanism that dynamically adjusts the importance of task-specific losses based on each task's robustness to pruning. We demonstrate the effectiveness of AdapMTL through comprehensive experiments on popular multitask datasets, namely NYU-v2 and Tiny-Taskonomy, with different architectures, showcasing superior performance compared to state-of-the-art pruning methods.

AdapMTL: Adaptive Pruning Framework for Multitask Learning Model

TL;DR

. This approach advances practical, resource-efficient MTL deployments and shows potential for scalable pruning in multimodal and larger architectures like VideoBERT and beyond.

Abstract

Paper Structure (27 sections, 9 equations, 9 figures, 9 tables)

This paper contains 27 sections, 9 equations, 9 figures, 9 tables.

Introduction
Related Work
Methodology
Preliminary
Adaptive Multitask Model Pruning
Adaptive Weighting Mechanism
Experiments
Experiment Settings
Datasets and tasks
Evaluation Metrics and Loss Functions
Baselines for Comparison
Experiment Results
Results on NYU-V2
Results under various sparsity settings
Results on Tiny-Taskonomy
...and 12 more sections

Figures (9)

Figure 1: Overview of pruning a dense multitask model. The red parts represent the shared backbone, and the leaf boxes represent the task-specific heads. In the sparse model, the blank spaces indicate the pruned parameters.
Figure 2: Difference between hard and soft thresholding. Hard thresholding causes abrupt weight discontinuities during training, while soft thresholding ensures a smooth relationship for consistent learning.
Figure 3: Breakdown of component-wise sparsity allocation during training. We use the ResNet34 backbone and achieve 90% overall sparsity in the end.
Figure 4: Comparison of state-of-the-art methods, including DiSparse sun2022disparse, LTH frankle2019lottery, SNIP lee2018snip, and IMP han2015deep, on the NYUv2 dataset, evaluated with different MTL backbones and under various sparsity settings.
Figure 5: Visualization comparing the sensitivity of the backbone and task head in a MobileNetV2 backbone MTL model. The y-axis represents the total sparsity of all task heads.
...and 4 more figures

AdapMTL: Adaptive Pruning Framework for Multitask Learning Model

TL;DR

Abstract

AdapMTL: Adaptive Pruning Framework for Multitask Learning Model

Authors

TL;DR

Abstract

Table of Contents

Figures (9)