SuperSAM: Crafting a SAM Supernetwork via Structured Pruning and Unstructured Parameter Prioritization
Waqwoya Abebe, Sadegh Jafari, Sixing Yu, Akash Dutta, Jan Strube, Nathan R. Tallent, Luanzheng Guo, Pablo Munoz, Ali Jannesari
TL;DR
This work introduces SuperSAM, a method to transform the Segment Anything Model (SAM) into a weight-sharing ViT-based supernetwork for neural architecture search. It proposes a two-dimensional elasticity strategy that combines probabilistic layer pruning (structured) with Wanda-inspired row/column-wise reordering and slicing of MLP blocks to craft a scalable search space and expand high-quality subnetworks. Trained via the sandwich rule on multiple vision datasets, SuperSAM enables the discovery of subnetworks that are 30-70% smaller than the original SAM ViT-B yet achieve comparable or superior performance, with deployment aided by OpenTuner for constrained search. The results demonstrate improved search-space design for transformer-based NAS and practical pathways to efficient, robust segmentation models across diverse data domains.
Abstract
Neural Architecture Search (NAS) is a powerful approach of automating the design of efficient neural architectures. In contrast to traditional NAS methods, recently proposed one-shot NAS methods prove to be more efficient in performing NAS. One-shot NAS works by generating a singular weight-sharing supernetwork that acts as a search space (container) of subnetworks. Despite its achievements, designing the one-shot search space remains a major challenge. In this work we propose a search space design strategy for Vision Transformer (ViT)-based architectures. In particular, we convert the Segment Anything Model (SAM) into a weight-sharing supernetwork called SuperSAM. Our approach involves automating the search space design via layer-wise structured pruning and parameter prioritization. While the structured pruning applies probabilistic removal of certain transformer layers, parameter prioritization performs weight reordering and slicing of MLP-blocks in the remaining layers. We train supernetworks on several datasets using the sandwich rule. For deployment, we enhance subnetwork discovery by utilizing a program autotuner to identify efficient subnetworks within the search space. The resulting subnetworks are 30-70% smaller in size compared to the original pre-trained SAM ViT-B, yet outperform the pretrained model. Our work introduces a new and effective method for ViT NAS search-space design.
