Table of Contents
Fetching ...

Efficient Methods for Natural Language Processing: A Survey

Marcos Treviso, Ji-Ung Lee, Tianchu Ji, Betty van Aken, Qingqing Cao, Manuel R. Ciosici, Michael Hassid, Kenneth Heafield, Sara Hooker, Colin Raffel, Pedro H. Martins, André F. T. Martins, Jessica Zosa Forde, Peter Milder, Edwin Simpson, Noam Slonim, Jesse Dodge, Emma Strubell, Niranjan Balasubramanian, Leon Derczynski, Iryna Gurevych, Roy Schwartz

TL;DR

This survey addresses the resource and energy bottlenecks in modern NLP by cataloging a broad set of efficiency approaches spanning data usage, model design, pre-training, fine-tuning, inference, and hardware. It highlights concrete methods such as data deduplication, active and curriculum learning, sparse modeling, retrieval-augmented architectures, adapters and LoRA for parameter efficiency, pruning, distillation, and quantization, as well as hardware-aware co-design and edge deployment strategies. The work emphasizes that efficiency is multi-faceted, requiring careful evaluation along Pareto fronts with metrics like FLOP/s, power, and carbon emissions, and it calls for standardized reporting and cross-stage analysis to meaningfully compare methods. Overall, the paper outlines concrete, scalable directions for reducing NLP compute without sacrificing performance and stresses the importance of algorithm-hardware co-design for real-world impact.

Abstract

Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows. Such resources include data, time, storage, or energy, all of which are naturally limited and unevenly distributed. This motivates research into efficient methods that require fewer resources to achieve similar results. This survey synthesizes and relates current methods and findings in efficient NLP. We aim to provide both guidance for conducting NLP under limited resources, and point towards promising research directions for developing more efficient methods.

Efficient Methods for Natural Language Processing: A Survey

TL;DR

This survey addresses the resource and energy bottlenecks in modern NLP by cataloging a broad set of efficiency approaches spanning data usage, model design, pre-training, fine-tuning, inference, and hardware. It highlights concrete methods such as data deduplication, active and curriculum learning, sparse modeling, retrieval-augmented architectures, adapters and LoRA for parameter efficiency, pruning, distillation, and quantization, as well as hardware-aware co-design and edge deployment strategies. The work emphasizes that efficiency is multi-faceted, requiring careful evaluation along Pareto fronts with metrics like FLOP/s, power, and carbon emissions, and it calls for standardized reporting and cross-stage analysis to meaningfully compare methods. Overall, the paper outlines concrete, scalable directions for reducing NLP compute without sacrificing performance and stresses the importance of algorithm-hardware co-design for real-world impact.

Abstract

Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows. Such resources include data, time, storage, or energy, all of which are naturally limited and unevenly distributed. This motivates research into efficient methods that require fewer resources to achieve similar results. This survey synthesizes and relates current methods and findings in efficient NLP. We aim to provide both guidance for conducting NLP under limited resources, and point towards promising research directions for developing more efficient methods.
Paper Structure (48 sections, 3 figures)

This paper contains 48 sections, 3 figures.

Figures (3)

  • Figure 1: Exponential growth in the number of parameters in pretrained language models. Adapted from Lakim2022AHA.
  • Figure 2: Schematic overview of the efficient NLP stages covered in this paper, starting with data collection and model design, followed by training and inference, and ending with evaluation and model selection. Notably, the training stage is divided into two parts: pre-training, which aims to learn generalizable parameters, and fine-tuning, which optimizes these parameters for specific downstream tasks.
  • Figure 3: Typology of efficient NLP methods.