Table of Contents
Fetching ...

Utilizing dynamic sparsity on pretrained DETR

Reza Sedghi, Anand Subramoney, David Kappel

TL;DR

This work tackles efficient inference for vision transformers by exploiting activation sparsity in pretrained DETR without retraining. It introduces two complementary methods, SIBS and MGS, to predict and prune MLP activations on the fly, with SIBS being static and MGS dynamic. On COCO, SIBS yields limited gains, while MGS achieves high activation sparsity—up to roughly $0.85$–$0.95$—with performance near or above the baseline and substantial FLOP reductions due to input-adaptive gating. The results demonstrate that lightweight, on-top sparsification can practically accelerate vision transformers without full retraining, enabling easier deployment of pretrained DETR-like models.

Abstract

Efficient inference with transformer-based models remains a challenge, especially in vision tasks like object detection. We analyze the inherent sparsity in the MLP layers of DETR and introduce two methods to exploit it without retraining. First, we propose Static Indicator-Based Sparsification (SIBS), a heuristic method that predicts neuron inactivity based on fixed activation patterns. While simple, SIBS offers limited gains due to the input-dependent nature of sparsity. To address this, we introduce Micro-Gated Sparsification (MGS), a lightweight gating mechanism trained on top of a pretrained DETR. MGS predicts dynamic sparsity using a small linear layer and achieves up to 85 to 95% activation sparsity. Experiments on the COCO dataset show that MGS maintains or even improves performance while significantly reducing computation. Our method offers a practical, input-adaptive approach to sparsification, enabling efficient deployment of pretrained vision transformers without full model retraining.

Utilizing dynamic sparsity on pretrained DETR

TL;DR

This work tackles efficient inference for vision transformers by exploiting activation sparsity in pretrained DETR without retraining. It introduces two complementary methods, SIBS and MGS, to predict and prune MLP activations on the fly, with SIBS being static and MGS dynamic. On COCO, SIBS yields limited gains, while MGS achieves high activation sparsity—up to roughly —with performance near or above the baseline and substantial FLOP reductions due to input-adaptive gating. The results demonstrate that lightweight, on-top sparsification can practically accelerate vision transformers without full retraining, enabling easier deployment of pretrained DETR-like models.

Abstract

Efficient inference with transformer-based models remains a challenge, especially in vision tasks like object detection. We analyze the inherent sparsity in the MLP layers of DETR and introduce two methods to exploit it without retraining. First, we propose Static Indicator-Based Sparsification (SIBS), a heuristic method that predicts neuron inactivity based on fixed activation patterns. While simple, SIBS offers limited gains due to the input-dependent nature of sparsity. To address this, we introduce Micro-Gated Sparsification (MGS), a lightweight gating mechanism trained on top of a pretrained DETR. MGS predicts dynamic sparsity using a small linear layer and achieves up to 85 to 95% activation sparsity. Experiments on the COCO dataset show that MGS maintains or even improves performance while significantly reducing computation. Our method offers a practical, input-adaptive approach to sparsification, enabling efficient deployment of pretrained vision transformers without full model retraining.

Paper Structure

This paper contains 9 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: (Top) Standard feedforward (FF) block used in DETR’s encoder and decoder layers. Blue represents the input hidden state, followed by a linear layer (gray), ReLU activation, and a second linear layer. (bottom) Our proposed Micro-Gated Sparsification (MGS) structure. A lightweight gating layer (green) with sigmoid activation is added before the first linear layer to dynamically mask small groups of neurons based on input, reducing unnecessary computation at inference time.
  • Figure 2: Activation masking in DETR using random vs. Top-K strategies across blocks. Top-K consistently reveals higher sparsity potential, highlighting the need for structured, input-aware sparsification.
  • Figure 3: FLOPS reduction