Table of Contents
Fetching ...

Towards Incremental Transformers: An Empirical Analysis of Transformer Models for Incremental NLU

Patrick Kahardipraja, Brielen Madureira, David Schlangen

TL;DR

This work investigates enabling incremental NLU with Transformer models by evaluating a Linear Transformer (LT) endowed with a recurrence mechanism. It compares Baseline restart-incremental Transformers with several LT-based variants (LT, LT+R, LT+R+CM, LT+R+CM+D) and enhances learning with input-prefix training and delayed outputs. Across nine English datasets for tagging and classification, LT+R+CM delivers superior incremental metrics and much faster inference, though with some non-incremental quality trade-offs that delays and prefix training can mitigate. The results highlight the importance of temporal order in incremental processing and show that recurrence yields monotonic, stable partial outputs, making LT a compelling option for real-time incremental NLU in interactive systems.

Abstract

Incremental processing allows interactive systems to respond based on partial inputs, which is a desirable property e.g. in dialogue agents. The currently popular Transformer architecture inherently processes sequences as a whole, abstracting away the notion of time. Recent work attempts to apply Transformers incrementally via restart-incrementality by repeatedly feeding, to an unchanged model, increasingly longer input prefixes to produce partial outputs. However, this approach is computationally costly and does not scale efficiently for long sequences. In parallel, we witness efforts to make Transformers more efficient, e.g. the Linear Transformer (LT) with a recurrence mechanism. In this work, we examine the feasibility of LT for incremental NLU in English. Our results show that the recurrent LT model has better incremental performance and faster inference speed compared to the standard Transformer and LT with restart-incrementality, at the cost of part of the non-incremental (full sequence) quality. We show that the performance drop can be mitigated by training the model to wait for right context before committing to an output and that training with input prefixes is beneficial for delivering correct partial outputs.

Towards Incremental Transformers: An Empirical Analysis of Transformer Models for Incremental NLU

TL;DR

This work investigates enabling incremental NLU with Transformer models by evaluating a Linear Transformer (LT) endowed with a recurrence mechanism. It compares Baseline restart-incremental Transformers with several LT-based variants (LT, LT+R, LT+R+CM, LT+R+CM+D) and enhances learning with input-prefix training and delayed outputs. Across nine English datasets for tagging and classification, LT+R+CM delivers superior incremental metrics and much faster inference, though with some non-incremental quality trade-offs that delays and prefix training can mitigate. The results highlight the importance of temporal order in incremental processing and show that recurrence yields monotonic, stable partial outputs, making LT a compelling option for real-time incremental NLU in interactive systems.

Abstract

Incremental processing allows interactive systems to respond based on partial inputs, which is a desirable property e.g. in dialogue agents. The currently popular Transformer architecture inherently processes sequences as a whole, abstracting away the notion of time. Recent work attempts to apply Transformers incrementally via restart-incrementality by repeatedly feeding, to an unchanged model, increasingly longer input prefixes to produce partial outputs. However, this approach is computationally costly and does not scale efficiently for long sequences. In parallel, we witness efforts to make Transformers more efficient, e.g. the Linear Transformer (LT) with a recurrence mechanism. In this work, we examine the feasibility of LT for incremental NLU in English. Our results show that the recurrent LT model has better incremental performance and faster inference speed compared to the standard Transformer and LT with restart-incrementality, at the cost of part of the non-incremental (full sequence) quality. We show that the performance drop can be mitigated by training the model to wait for right context before committing to an output and that training with input prefixes is beneficial for delivering correct partial outputs.

Paper Structure

This paper contains 13 sections, 2 equations, 2 figures, 14 tables.

Figures (2)

  • Figure 1: Incremental evaluation on the test sets. EO, CT and RC $\in$ [0, 1], y-axes are clipped to improve readability. Lower is better for EO and CT, higher for RC. For EO, the lines on the bars refer to original, delay=1 and delay=2, from top to bottom, and vice versa for RC, showing that delay improves the results. LT+R+CM performs better compared to the baseline and LT.
  • Figure 2: Incremental inference speed of models from Table \ref{['table:benchmark']} with increasing sequence length. LT+R+CM scales linearly with sequence length unlike the baseline and LT. Note that the incremental inference speed of LT+R+CM is similar to LT+R.