Table of Contents
Fetching ...

Parameter Aware Mamba Model for Multi-task Dense Prediction

Xinzhuo Yu, Yunzhi Zhuge, Sitong Gong, Lu Zhang, Pingping Zhang, Huchuan Lu

TL;DR

PAMM tackles global task interaction in multi-task dense prediction by fusing a parameter-aware Mamba block with Mixture-of-Experts in a decoder, augmented by a state-space formulation ($S4$) and a Multi-Directional Hilbert Scanning (MDHS) scheme. It introduces Task Priors via per-task parameters and priors, enabling task-specific properties to guide decoding. The architecture, built on a Vision Transformer backbone, achieves superior Delta_g on NYUD-v2 and PASCAL-Context, with extensive ablations confirming the contributions of MoE, priors, and MDHS. Overall, PAMM offers a principled, scalable approach for globally coherent, prior-guided, task-conditioned dense predictions in vision.

Abstract

Understanding the inter-relations and interactions between tasks is crucial for multi-task dense prediction. Existing methods predominantly utilize convolutional layers and attention mechanisms to explore task-level interactions. In this work, we introduce a novel decoder-based framework, Parameter Aware Mamba Model (PAMM), specifically designed for dense prediction in multi-task learning setting. Distinct from approaches that employ Transformers to model holistic task relationships, PAMM leverages the rich, scalable parameters of state space models to enhance task interconnectivity. It features dual state space parameter experts that integrate and set task-specific parameter priors, capturing the intrinsic properties of each task. This approach not only facilitates precise multi-task interactions but also allows for the global integration of task priors through the structured state space sequence model (S4). Furthermore, we employ the Multi-Directional Hilbert Scanning method to construct multi-angle feature sequences, thereby enhancing the sequence model's perceptual capabilities for 2D data. Extensive experiments on the NYUD-v2 and PASCAL-Context benchmarks demonstrate the effectiveness of our proposed method. Our code is available at https://github.com/CQC-gogopro/PAMM.

Parameter Aware Mamba Model for Multi-task Dense Prediction

TL;DR

PAMM tackles global task interaction in multi-task dense prediction by fusing a parameter-aware Mamba block with Mixture-of-Experts in a decoder, augmented by a state-space formulation () and a Multi-Directional Hilbert Scanning (MDHS) scheme. It introduces Task Priors via per-task parameters and priors, enabling task-specific properties to guide decoding. The architecture, built on a Vision Transformer backbone, achieves superior Delta_g on NYUD-v2 and PASCAL-Context, with extensive ablations confirming the contributions of MoE, priors, and MDHS. Overall, PAMM offers a principled, scalable approach for globally coherent, prior-guided, task-conditioned dense predictions in vision.

Abstract

Understanding the inter-relations and interactions between tasks is crucial for multi-task dense prediction. Existing methods predominantly utilize convolutional layers and attention mechanisms to explore task-level interactions. In this work, we introduce a novel decoder-based framework, Parameter Aware Mamba Model (PAMM), specifically designed for dense prediction in multi-task learning setting. Distinct from approaches that employ Transformers to model holistic task relationships, PAMM leverages the rich, scalable parameters of state space models to enhance task interconnectivity. It features dual state space parameter experts that integrate and set task-specific parameter priors, capturing the intrinsic properties of each task. This approach not only facilitates precise multi-task interactions but also allows for the global integration of task priors through the structured state space sequence model (S4). Furthermore, we employ the Multi-Directional Hilbert Scanning method to construct multi-angle feature sequences, thereby enhancing the sequence model's perceptual capabilities for 2D data. Extensive experiments on the NYUD-v2 and PASCAL-Context benchmarks demonstrate the effectiveness of our proposed method. Our code is available at https://github.com/CQC-gogopro/PAMM.

Paper Structure

This paper contains 35 sections, 23 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: Comparison with other multi-task learning methods. (a) CNN-based methods: These approaches are constrained by the local receptive fields of convolutions, hindering their ability to capture global context in multi-task scenarios, even with the inclusion of mixture of experts (MoE). (b) Transformer-based methods: While these methods can capture task-specific contexts, they lack inherent task priors. (c) Our PAME: By leveraging parameter experts, our method enables comprehensive parameter interactions across tasks and incorporates task priors to facilitate more effective task decoding.
  • Figure 2: Overview of the architecture. (a) The process initiates with task-specific local feature extraction through task convolution. Following this, the Parameter Aware Mamba Experts (PAME) module configures task experts within Mamba's parameter space to promote global task interaction and integrates task priors for detail enhancement. Ultimately, it aggregates features from multiple scales for task-specific decoding. (b) The structure of Parameter Experts. To facilitate joint optimization across different tasks, we construct a task expert mixed network based on two parameters of Mamba. The multi-task experts are weighted and fused by a routing network, while also establishing non-shareable task-specific paths for different tasks.
  • Figure 3: Details of the proposed Parameter Aware Mamba Experts (PAME) module. Building on established Mamba-based methods, PAME integrates depth-wise separable convolutions (DWConv) sifre2014rigid, gating, and skip connections. Key innovations include Parameter Experts and Parameter Priors for optimizing parameters B and C, along with a state space computational approach using Multi-Directional Hilbert Scannig (MDHS).
  • Figure 4: The proposed Multi-Directional Hilbert Scanning (MDHS) method serializes portions of the input parameters and the input $x$ following the Hilbert scanning approach. The output is then computed using a state equation, after which the original image order is restored through deserialization. Finally, the outputs from the scans in different directions are aggregated to produce the final result.
  • Figure 5: Visual comparison of multi-task predictions generated by our method and InvPT++ on the PASCAL-Context dataset. Our method exhibits enhanced generalization capabilities, enabling better capture of features of small objects.
  • ...and 5 more figures