Table of Contents
Fetching ...

XLM: A Python package for non-autoregressive language models

Dhruvesh Patel, Durga Prasad Maram, Sai Sreenivas Chintha, Benjamin Rozonoyer, Andrew McCallum

TL;DR

XLM addresses the lack of unified tooling for non-autoregressive text generation by introducing a modular Python package built on PyTorch, PyTorch Lightning, and Hydra. It emphasizes design principles of maximal independence, composition over inheritance, and runtime configurability to enable rapid prototyping of small non-autoregressive architectures. The paper details core components (DataModule, Harness, configuration management) and demonstrates an end-to-end ILM implementation on StarEasy with a scaffolding workflow, plus a benchmarking section that reproduces results on synthetic and LM1B tasks. The work argues that XLM lowers barriers to systematic comparisons and accelerates research in non-autoregressive generation, with future plans to extend to more tasks and non-text domains.

Abstract

In recent years, there has been a resurgence of interest in non-autoregressive text generation in the context of general language modeling. Unlike the well-established autoregressive language modeling paradigm, which has a plethora of standard training and inference libraries, implementations of non-autoregressive language modeling have largely been bespoke making it difficult to perform systematic comparisons of different methods. Moreover, each non-autoregressive language model typically requires it own data collation, loss, and prediction logic, making it challenging to reuse common components. In this work, we present the XLM python package, which is designed to make implementing small non-autoregressive language models faster with a secondary goal of providing a suite of small pre-trained models (through a companion xlm-models package) that can be used by the research community. The code is available at https://github.com/dhruvdcoder/xlm-core.

XLM: A Python package for non-autoregressive language models

TL;DR

XLM addresses the lack of unified tooling for non-autoregressive text generation by introducing a modular Python package built on PyTorch, PyTorch Lightning, and Hydra. It emphasizes design principles of maximal independence, composition over inheritance, and runtime configurability to enable rapid prototyping of small non-autoregressive architectures. The paper details core components (DataModule, Harness, configuration management) and demonstrates an end-to-end ILM implementation on StarEasy with a scaffolding workflow, plus a benchmarking section that reproduces results on synthetic and LM1B tasks. The work argues that XLM lowers barriers to systematic comparisons and accelerates research in non-autoregressive generation, with future plans to extend to more tasks and non-text domains.

Abstract

In recent years, there has been a resurgence of interest in non-autoregressive text generation in the context of general language modeling. Unlike the well-established autoregressive language modeling paradigm, which has a plethora of standard training and inference libraries, implementations of non-autoregressive language modeling have largely been bespoke making it difficult to perform systematic comparisons of different methods. Moreover, each non-autoregressive language model typically requires it own data collation, loss, and prediction logic, making it challenging to reuse common components. In this work, we present the XLM python package, which is designed to make implementing small non-autoregressive language models faster with a secondary goal of providing a suite of small pre-trained models (through a companion xlm-models package) that can be used by the research community. The code is available at https://github.com/dhruvdcoder/xlm-core.

Paper Structure

This paper contains 49 sections, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Overview of design. It consists of two classes of components: the core components ( and ) and the model-specific components, whose implementations depend on the model logic. These components are defined in the configuration files managed by Hydra (see \ref{['fig:config']}), enabling arbitrary component swapping. The component is responsible for instantiating all components (model, loss, predictor, etc.) and delegating their respective functionalities. The component manages multiple datasets across workflow stages using objects, each handling a dataset and an appropriate .
  • Figure 2: Directory structure generated by the scaffolding script.
  • Figure 3: Configuration tree for a typical experiment (e.g. for ILM for a seq2seq planning task on the StarEasy dataset). The experiment config is at the root of the nesting structure, contains global parameters, and composes component configs (model, model_type, and datamodule). The model/ilm.yaml file stores the parameters for the model class. The model_type/ilm.yaml file contains the information needed to instantiate the loss function, predictor, and metric components. The datamodule/star_easy_ilm.yaml composes the configs of the s and s (here, StarEasy and seq2seq collators). Note: Only partial entries are shown in the figure for brevity.
  • Figure 4: An example of a prompt and target for the StarEasy dataset.
  • Figure 5: The configs/model_type/ilm.yaml config file for the ILM model. It contains sections for , and .
  • ...and 4 more figures