Bringing Emerging Architectures to Sequence Labeling in NLP
Ana Ezquerro, Carlos Gómez-Rodríguez, David Vilares
TL;DR
This study broadens the evaluation of sequence labeling in NLP beyond Transformer encoders by systematically testing diffusion tagging, adversarial tagging, xLSTM, and SSD-based models across multilingual PoS, NER, and structured parsing tasks. The results show that adversarial tagging often matches or surpasses Transformer baselines, especially in complex structured settings, while diffusion tagging and structured-state-space models underperform in many cases. Non-Transformer encoders like the Bidirectional xLSTM and BiLSTM variants can excel on simpler tagging tasks but struggle to consistently beat Transformers on harder, long-range dependency problems. The findings suggest adversarial labeling as a promising direction for robust tagging across diverse linguistic structures, with practical implications for multilingual NLP where resource-conscious non-Transformer architectures can still deliver competitive performance. Limitations include computational resource demands and the use of MLM encoders over generative encoders, shaping the experimental design and scope of generalization.
Abstract
Pretrained Transformer encoders are the dominant approach to sequence labeling. While some alternative architectures-such as xLSTMs, structured state-space models, diffusion models, and adversarial learning-have shown promise in language modeling, few have been applied to sequence labeling, and mostly on flat or simplified tasks. We study how these architectures adapt across tagging tasks that vary in structural complexity, label space, and token dependencies, with evaluation spanning multiple languages. We find that the strong performance previously observed in simpler settings does not always generalize well across languages or datasets, nor does it extend to more complex structured tasks.
