ALPINE: An adaptive language-agnostic pruning method for language models for code
Mootez Saad, José Antonio Hernández López, Boqi Chen, Dániel Varró, Tushar Sharma
TL;DR
ALPINE tackles the resource-intensity of language models for code by introducing an adaptive token-pruning technique that is language-agnostic and plug-and-play for Transformer encoders. It computes per-token importance from attention probabilities and prunes tokens outside a dynamic range, reducing input length and FLOPs while preserving performance. Across two SE tasks and three models, ALPINE achieves substantial reductions in FLOPs, memory footprint, and CO2 emissions with minimal accuracy loss, demonstrating practical gains for deploying code-aware LMs on consumer-grade hardware. This work highlights redundancy in source-code corpora and paves the way for more accessible, sustainable software engineering with transformer-based models.
Abstract
Language models of code have demonstrated state-of-the-art performance across various software engineering and source code analysis tasks. However, their demanding computational resource requirements and consequential environmental footprint remain as significant challenges. This work introduces ALPINE, an adaptive programming language-agnostic pruning technique designed to substantially reduce these models' computational overhead. The proposed method offers a pluggable layer that can be integrated with all Transformer-based models. With ALPINE, input sequences undergo adaptive compression throughout the pipeline, reaching a size up to $\times 3$ less their initial size, resulting in significantly reduced computational load. Our experiments on two software engineering tasks, defect prediction and code clone detection across three language models CodeBERT, GraphCodeBERT and UniXCoder show that ALPINE achieves up to a 50% reduction in FLOPs, a 58.1% decrease in memory footprint, and a 28.1% improvement in throughput on average. This led to a reduction in CO2 by up to $44.85$%. Importantly, it achieves the reduction in computation resources while maintaining up to 98.1% of the original predictive performance. These findings highlight the potential of ALPINE in making language models of code more resource-efficient and accessible while preserving their performance, contributing to the overall sustainability of adopting language models in software development. Also, it sheds light on redundant and noisy information in source code analysis corpora, as shown by the substantial sequence compression achieved by ALPINE.
