Table of Contents
Fetching ...

SuperSAM: Crafting a SAM Supernetwork via Structured Pruning and Unstructured Parameter Prioritization

Waqwoya Abebe, Sadegh Jafari, Sixing Yu, Akash Dutta, Jan Strube, Nathan R. Tallent, Luanzheng Guo, Pablo Munoz, Ali Jannesari

TL;DR

This work introduces SuperSAM, a method to transform the Segment Anything Model (SAM) into a weight-sharing ViT-based supernetwork for neural architecture search. It proposes a two-dimensional elasticity strategy that combines probabilistic layer pruning (structured) with Wanda-inspired row/column-wise reordering and slicing of MLP blocks to craft a scalable search space and expand high-quality subnetworks. Trained via the sandwich rule on multiple vision datasets, SuperSAM enables the discovery of subnetworks that are 30-70% smaller than the original SAM ViT-B yet achieve comparable or superior performance, with deployment aided by OpenTuner for constrained search. The results demonstrate improved search-space design for transformer-based NAS and practical pathways to efficient, robust segmentation models across diverse data domains.

Abstract

Neural Architecture Search (NAS) is a powerful approach of automating the design of efficient neural architectures. In contrast to traditional NAS methods, recently proposed one-shot NAS methods prove to be more efficient in performing NAS. One-shot NAS works by generating a singular weight-sharing supernetwork that acts as a search space (container) of subnetworks. Despite its achievements, designing the one-shot search space remains a major challenge. In this work we propose a search space design strategy for Vision Transformer (ViT)-based architectures. In particular, we convert the Segment Anything Model (SAM) into a weight-sharing supernetwork called SuperSAM. Our approach involves automating the search space design via layer-wise structured pruning and parameter prioritization. While the structured pruning applies probabilistic removal of certain transformer layers, parameter prioritization performs weight reordering and slicing of MLP-blocks in the remaining layers. We train supernetworks on several datasets using the sandwich rule. For deployment, we enhance subnetwork discovery by utilizing a program autotuner to identify efficient subnetworks within the search space. The resulting subnetworks are 30-70% smaller in size compared to the original pre-trained SAM ViT-B, yet outperform the pretrained model. Our work introduces a new and effective method for ViT NAS search-space design.

SuperSAM: Crafting a SAM Supernetwork via Structured Pruning and Unstructured Parameter Prioritization

TL;DR

This work introduces SuperSAM, a method to transform the Segment Anything Model (SAM) into a weight-sharing ViT-based supernetwork for neural architecture search. It proposes a two-dimensional elasticity strategy that combines probabilistic layer pruning (structured) with Wanda-inspired row/column-wise reordering and slicing of MLP blocks to craft a scalable search space and expand high-quality subnetworks. Trained via the sandwich rule on multiple vision datasets, SuperSAM enables the discovery of subnetworks that are 30-70% smaller than the original SAM ViT-B yet achieve comparable or superior performance, with deployment aided by OpenTuner for constrained search. The results demonstrate improved search-space design for transformer-based NAS and practical pathways to efficient, robust segmentation models across diverse data domains.

Abstract

Neural Architecture Search (NAS) is a powerful approach of automating the design of efficient neural architectures. In contrast to traditional NAS methods, recently proposed one-shot NAS methods prove to be more efficient in performing NAS. One-shot NAS works by generating a singular weight-sharing supernetwork that acts as a search space (container) of subnetworks. Despite its achievements, designing the one-shot search space remains a major challenge. In this work we propose a search space design strategy for Vision Transformer (ViT)-based architectures. In particular, we convert the Segment Anything Model (SAM) into a weight-sharing supernetwork called SuperSAM. Our approach involves automating the search space design via layer-wise structured pruning and parameter prioritization. While the structured pruning applies probabilistic removal of certain transformer layers, parameter prioritization performs weight reordering and slicing of MLP-blocks in the remaining layers. We train supernetworks on several datasets using the sandwich rule. For deployment, we enhance subnetwork discovery by utilizing a program autotuner to identify efficient subnetworks within the search space. The resulting subnetworks are 30-70% smaller in size compared to the original pre-trained SAM ViT-B, yet outperform the pretrained model. Our work introduces a new and effective method for ViT NAS search-space design.
Paper Structure (7 sections, 2 equations, 7 figures, 1 table)

This paper contains 7 sections, 2 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: After freezing the prompt encoder and applying 2D elasticity on the image encoder, the image encoder and mask decoder are jointly optimized using the sandwich rule. As shown above, while medium sized subnets may or may not overlap with each other, the smallest subnetwork lies in the intersectional region of all other subnetworks.
  • Figure 2: The NAS search space design involves two main operations. 1. Identify prunable layers and assign a pruning probability. 2. Apply weight reordering in MLP blocks of surviving layers before a slicing window operation. These operations significantly reduce the size of the subnetworks in the search space as well as improve the quality of the subnetwork candidates.
  • Figure 3: Comparing mIoU and model sizes by pruning one or more layers from the SAM ViT-B image encoder.
  • Figure 4: Comparing pareto frontier of different reordering strategies on different tasks.
  • Figure 5: Comparing pareto frontier of different search space design strategies. Supernet A was generated via 1D elasticity (just windowing), Supernet B uses the proposed 2D elasticity.
  • ...and 2 more figures