Table of Contents
Fetching ...

LOTUS: Improving Transformer Efficiency with Sparsity Pruning and Data Lottery Tickets

Ojasw Upadhyay

TL;DR

The paper addresses the high computational cost of training Vision Transformers by proposing LOTUS, a framework that combines data-level lottery tickets with sparsity-based pruning to accelerate training. LOTUS first identifies informative data patches via attention maps (data lottery tickets), then applies a two-stage pruning pipeline—Essential Sparsity on a pretrained model followed by Instant Soup Pruning with a denoised mask—to reduce parameters, followed by fine-tuning on the remaining data. Empirical results on CIFAR-10 with a pretrained ViT show that pruning to 30% sparsity can retain about 79% accuracy, and data lottery tickets enable rapid convergence, achieving near-state-of-the-art performance by around epoch 5; however, the ISSP component struggles, delivering significantly lower accuracy (~50%), suggesting the need for further refinement. Overall, the work demonstrates the potential of integrating data selection with sparsity techniques to enable faster, more efficient training of vision transformers and outlines directions for future improvement and generalization.

Abstract

Vision transformers have revolutionized computer vision, but their computational demands present challenges for training and deployment. This paper introduces LOTUS (LOttery Transformers with Ultra Sparsity), a novel method that leverages data lottery ticket selection and sparsity pruning to accelerate vision transformer training while maintaining accuracy. Our approach focuses on identifying and utilizing the most informative data subsets and eliminating redundant model parameters to optimize the training process. Through extensive experiments, we demonstrate the effectiveness of LOTUS in achieving rapid convergence and high accuracy with significantly reduced computational requirements. This work highlights the potential of combining data selection and sparsity techniques for efficient vision transformer training, opening doors for further research and development in this area.

LOTUS: Improving Transformer Efficiency with Sparsity Pruning and Data Lottery Tickets

TL;DR

The paper addresses the high computational cost of training Vision Transformers by proposing LOTUS, a framework that combines data-level lottery tickets with sparsity-based pruning to accelerate training. LOTUS first identifies informative data patches via attention maps (data lottery tickets), then applies a two-stage pruning pipeline—Essential Sparsity on a pretrained model followed by Instant Soup Pruning with a denoised mask—to reduce parameters, followed by fine-tuning on the remaining data. Empirical results on CIFAR-10 with a pretrained ViT show that pruning to 30% sparsity can retain about 79% accuracy, and data lottery tickets enable rapid convergence, achieving near-state-of-the-art performance by around epoch 5; however, the ISSP component struggles, delivering significantly lower accuracy (~50%), suggesting the need for further refinement. Overall, the work demonstrates the potential of integrating data selection with sparsity techniques to enable faster, more efficient training of vision transformers and outlines directions for future improvement and generalization.

Abstract

Vision transformers have revolutionized computer vision, but their computational demands present challenges for training and deployment. This paper introduces LOTUS (LOttery Transformers with Ultra Sparsity), a novel method that leverages data lottery ticket selection and sparsity pruning to accelerate vision transformer training while maintaining accuracy. Our approach focuses on identifying and utilizing the most informative data subsets and eliminating redundant model parameters to optimize the training process. Through extensive experiments, we demonstrate the effectiveness of LOTUS in achieving rapid convergence and high accuracy with significantly reduced computational requirements. This work highlights the potential of combining data selection and sparsity techniques for efficient vision transformer training, opening doors for further research and development in this area.
Paper Structure (5 sections, 4 figures)

This paper contains 5 sections, 4 figures.

Figures (4)

  • Figure 1: The plot shows the accuracy of the model at different sparsity levels.
  • Figure 2: An example of data lottery tickets created using attention maps with 10% of the data patches removed.
  • Figure 3: The accuracy and loss plots of the model fine-tuned on the data-level lottery tickets.
  • Figure 4: The accuracy of the model after applying the ISSP approach.