Assessing the Impact of Sequence Length Learning on Classification Tasks for Transformer Encoder Models

Jean-Thomas Baillargeon; Luc Lamontagne

Assessing the Impact of Sequence Length Learning on Classification Tasks for Transformer Encoder Models

Jean-Thomas Baillargeon, Luc Lamontagne

TL;DR

This paper addresses the problem that sequence length differences between classes can serve as a spurious predictive feature in transformer-based text classifiers. It introduces an empirical protocol to inject and detect sequence length learning across four datasets and several transformer architectures. Two data-centric mitigation strategies are evaluated: removing observations outside the overlapping length region and data augmentation using a masked language model to increase overlap. Findings show that length-based shortcuts can dominate predictions under imbalanced length distributions, but mitigation can reduce reliance on length and restore content-based decision making, with practical implications for private-domain NLP where length bias may be present.

Abstract

Classification algorithms using Transformer architectures can be affected by the sequence length learning problem whenever observations from different classes have a different length distribution. This problem causes models to use sequence length as a predictive feature instead of relying on important textual information. Although most public datasets are not affected by this problem, privately owned corpora for fields such as medicine and insurance may carry this data bias. The exploitation of this sequence length feature poses challenges throughout the value chain as these machine learning models can be used in critical applications. In this paper, we empirically expose this problem and present approaches to minimize its impacts.

Assessing the Impact of Sequence Length Learning on Classification Tasks for Transformer Encoder Models

TL;DR

Abstract

Paper Structure (16 sections, 2 figures, 9 tables)

This paper contains 16 sections, 2 figures, 9 tables.

Introduction
Related Work
Assessing the Impact of Sequence Length Learning
Datasets
Alteration of Training Datasets to Inject sequence length Imbalance
Generation of Test Subsets
Evaluation of the Impact of the sequence length Feature
Evaluation of sequence length Learning for Partial Class Overlap
Source of Sequence Length Learning in Transformers Layers
Sequence Length Learning for Different Transformer Encoder Architectures
Alleviating the Impact of Sequence Length Learning
Removing Problematic Observations
Augmenting Training Data Using LM
Data Augmention Approach
Results with Augmented Training data
...and 1 more sections

Figures (2)

Figure 1: (a) Gap-test partitioning
Figure 3: (a) Original distribution

Assessing the Impact of Sequence Length Learning on Classification Tasks for Transformer Encoder Models

TL;DR

Abstract

Assessing the Impact of Sequence Length Learning on Classification Tasks for Transformer Encoder Models

Authors

TL;DR

Abstract

Table of Contents

Figures (2)