LOTUS: Improving Transformer Efficiency with Sparsity Pruning and Data Lottery Tickets

Ojasw Upadhyay

LOTUS: Improving Transformer Efficiency with Sparsity Pruning and Data Lottery Tickets

Ojasw Upadhyay

TL;DR

The paper addresses the high computational cost of training Vision Transformers by proposing LOTUS, a framework that combines data-level lottery tickets with sparsity-based pruning to accelerate training. LOTUS first identifies informative data patches via attention maps (data lottery tickets), then applies a two-stage pruning pipeline—Essential Sparsity on a pretrained model followed by Instant Soup Pruning with a denoised mask—to reduce parameters, followed by fine-tuning on the remaining data. Empirical results on CIFAR-10 with a pretrained ViT show that pruning to 30% sparsity can retain about 79% accuracy, and data lottery tickets enable rapid convergence, achieving near-state-of-the-art performance by around epoch 5; however, the ISSP component struggles, delivering significantly lower accuracy (~50%), suggesting the need for further refinement. Overall, the work demonstrates the potential of integrating data selection with sparsity techniques to enable faster, more efficient training of vision transformers and outlines directions for future improvement and generalization.

Abstract

Vision transformers have revolutionized computer vision, but their computational demands present challenges for training and deployment. This paper introduces LOTUS (LOttery Transformers with Ultra Sparsity), a novel method that leverages data lottery ticket selection and sparsity pruning to accelerate vision transformer training while maintaining accuracy. Our approach focuses on identifying and utilizing the most informative data subsets and eliminating redundant model parameters to optimize the training process. Through extensive experiments, we demonstrate the effectiveness of LOTUS in achieving rapid convergence and high accuracy with significantly reduced computational requirements. This work highlights the potential of combining data selection and sparsity techniques for efficient vision transformer training, opening doors for further research and development in this area.

LOTUS: Improving Transformer Efficiency with Sparsity Pruning and Data Lottery Tickets

TL;DR

Abstract

LOTUS: Improving Transformer Efficiency with Sparsity Pruning and Data Lottery Tickets

Authors

TL;DR

Abstract

Table of Contents

Figures (4)