Table of Contents
Fetching ...

Decoupled Multi-Predictor Optimization for Inference-Efficient Model Tuning

Liwei Luo, Shuaitengyuan Li, Dongwei Ren, Qilong Wang, Pengfei Zhu, Qinghua Hu

TL;DR

A decoupled optimization method to effectively decouple representative and discriminative abilities in early stages in terms of architecture design and model optimization is proposed and clearly outperforms its counterparts when reducing computational cost.

Abstract

Recently, remarkable progress has been made in large-scale pre-trained model tuning, and inference efficiency is becoming more crucial for practical deployment. Early exiting in conjunction with multi-stage predictors, when cooperated with a parameter-efficient fine-tuning strategy, offers a straightforward way to achieve an inference-efficient model. However, a key challenge remains unresolved: How can early stages provide low-level fundamental features to deep stages while simultaneously supplying high-level discriminative features to early-stage predictors? To address this problem, we propose a Decoupled Multi-Predictor Optimization (DMPO) method to effectively decouple the low-level representative ability and high-level discriminative ability in early stages. First, in terms of architecture, we introduce a lightweight bypass module into multi-stage predictors for functional decomposition of shallow features from early stages, while a high-order statistics-based predictor is developed for early stages to effectively enhance their discriminative ability. To reasonably train our multi-predictor architecture, a decoupled optimization is proposed to allocate two-phase loss weights for multi-stage predictors during model tuning, where the initial training phase enables the model to prioritize the acquisition of discriminative ability of deep stages via emphasizing representative ability of early stages, and the latter training phase drives discriminative ability towards earlier stages as much as possible. As such, our DMPO can effectively decouple representative and discriminative abilities in early stages in terms of architecture design and model optimization. Experiments across various datasets and pre-trained backbones demonstrate that DMPO clearly outperforms its counterparts when reducing computational cost.

Decoupled Multi-Predictor Optimization for Inference-Efficient Model Tuning

TL;DR

A decoupled optimization method to effectively decouple representative and discriminative abilities in early stages in terms of architecture design and model optimization is proposed and clearly outperforms its counterparts when reducing computational cost.

Abstract

Recently, remarkable progress has been made in large-scale pre-trained model tuning, and inference efficiency is becoming more crucial for practical deployment. Early exiting in conjunction with multi-stage predictors, when cooperated with a parameter-efficient fine-tuning strategy, offers a straightforward way to achieve an inference-efficient model. However, a key challenge remains unresolved: How can early stages provide low-level fundamental features to deep stages while simultaneously supplying high-level discriminative features to early-stage predictors? To address this problem, we propose a Decoupled Multi-Predictor Optimization (DMPO) method to effectively decouple the low-level representative ability and high-level discriminative ability in early stages. First, in terms of architecture, we introduce a lightweight bypass module into multi-stage predictors for functional decomposition of shallow features from early stages, while a high-order statistics-based predictor is developed for early stages to effectively enhance their discriminative ability. To reasonably train our multi-predictor architecture, a decoupled optimization is proposed to allocate two-phase loss weights for multi-stage predictors during model tuning, where the initial training phase enables the model to prioritize the acquisition of discriminative ability of deep stages via emphasizing representative ability of early stages, and the latter training phase drives discriminative ability towards earlier stages as much as possible. As such, our DMPO can effectively decouple representative and discriminative abilities in early stages in terms of architecture design and model optimization. Experiments across various datasets and pre-trained backbones demonstrate that DMPO clearly outperforms its counterparts when reducing computational cost.

Paper Structure

This paper contains 45 sections, 5 equations, 12 figures, 15 tables.

Figures (12)

  • Figure 1: Comparison of various methods in terms of inference FLOPs and average accuracy on CIFAR-100 and five FGVC datasets, with corresponding results presented in \ref{['tab:fgvc']}.
  • Figure 2: Overview of our proposed DMPO method. Specifically, our DMPO consists of two components: (i) The left part of this figure is the multi-predictor architecture, inserting bypass modules, denoted as BYP, into the multi-predictors and replacing the early predictors with high-order statistics-based predictors; (ii) The right part is a decoupled optimization algorithm that adjusts multiple loss weights $\alpha$ in a two-phase manner to respectively emphasize representative ability (in the initial training phase) and discriminative ability (in the latter training phase) at early stages.
  • Figure 3: Schematic diagram of the early exiting network.
  • Figure 4: Comparison of two Dyn-Adapter zhang2024dynadapter variants and our DMPO in terms of cosine similarity with Original ViT (Stage 1) and classification accuracy (Stage 1 and Stage 2).
  • Figure 5: Heat maps of different methods at Stage 1. Dyn-Adapter propagates the feature in (b) to the next stage while feeding it into the predictor simultaneously. In contrast, DMPO propagates the fundamental features in (c) to the next stage, while transmitting the discriminative features in (d) to the predictor.
  • ...and 7 more figures