SlimSAM: 0.1% Data Makes Segment Anything Slim
Zigeng Chen, Gongfan Fang, Xinyin Ma, Xinchao Wang
TL;DR
SlimSAM tackles the challenge of compressing Segment Anything Model (SAM) with extremely limited training data. It introduces alternate slimming, which prunes and distills decoupled sub-structures (embedding and bottleneck) in alternating steps, and disturbed Taylor pruning to align pruning with distillation targets in a label-free setting. By decomposing the model and using dynamic distillation losses across embedding and bottleneck stages, SlimSAM achieves substantial parameter and MAC reductions (to $1.4\%$ and $0.8\%$ of the original) while using only $0.1\%$ of the training data, with inference speedups up to $8.6\times$ on a Titan RTX. The approach outperforms existing SAM compression methods under the same or lower data budgets, demonstrating that effective knowledge inheritance from a pre-trained SAM can be preserved with minimal data and compute.
Abstract
Current approaches for compressing the Segment Anything Model (SAM) yield commendable results, yet necessitate extensive data to train a new network from scratch. Employing conventional pruning techniques can remarkably reduce data requirements but would suffer from a degradation in performance. To address this challenging trade-off, we introduce SlimSAM, a novel data-efficient SAM compression method that achieves superior performance with extremely less training data. The essence of SlimSAM is encapsulated in the alternate slimming framework which effectively enhances knowledge inheritance under severely limited training data availability and exceptional pruning ratio. Diverging from prior techniques, our framework progressively compresses the model by alternately pruning and distilling distinct, decoupled sub-structures. Disturbed Taylor pruning is also proposed to address the misalignment between the pruning objective and training target, thereby boosting the post-distillation after pruning. SlimSAM yields significant performance improvements while demanding over 10 times less training data than any other existing compression methods. Even when compared to the original SAM, SlimSAM achieves approaching performance while reducing parameter counts to merely 1.4% (9.1M), MACs to 0.8% (23G), and requiring only 0.1% (10k) of the SAM training data. The code is available at http://github.com/czg1225/SlimSAM.
