Elastic Architecture Search for Efficient Language Models
Shang Wang
TL;DR
Large pretrained language models demand substantial compute and energy, motivating compact and efficient architectures. The paper proposes Elastic Language Model (ELM), a neural architecture search framework with a flexible search space (BERT- and MobileBERT-based blocks), dynamic dimension and head search guided by PCA and CKA, and relational knowledge distillation to preserve block diversity. Through extensive experiments on masked and causal language modeling, ELM outperforms existing lightweight NAS methods and achieves competitive or superior results with far fewer parameters and lower latency. This approach offers a practical path to deploy efficient language models at scale while maintaining strong performance.
Abstract
As large pre-trained language models become increasingly critical to natural language understanding (NLU) tasks, their substantial computational and memory requirements have raised significant economic and environmental concerns. Addressing these challenges, this paper introduces the Elastic Language Model (ELM), a novel neural architecture search (NAS) method optimized for compact language models. ELM extends existing NAS approaches by introducing a flexible search space with efficient transformer blocks and dynamic modules for dimension and head number adjustment. These innovations enhance the efficiency and flexibility of the search process, which facilitates more thorough and effective exploration of model architectures. We also introduce novel knowledge distillation losses that preserve the unique characteristics of each block, in order to improve the discrimination between architectural choices during the search process. Experiments on masked language modeling and causal language modeling tasks demonstrate that models discovered by ELM significantly outperform existing methods.
