Linguistic Knowledge Can Enhance Encoder-Decoder Models (If You Let It)
Alessio Miaschi, Felice Dell'Orletta, Giulia Venturi
TL;DR
The paper addresses whether incorporating linguistic knowledge via intermediate fine-tuning can boost encoder-decoder models in a target task of predicting sentence complexity. It implements a two-step STILTs pipeline where T5 is first fine-tuned on multi-task linguistic properties derived from ProfilingUD, then fine-tuned on the complexity target, across Italian and English using mono- and multilingual variants and varying data sizes. The main findings show that linguistically informed intermediate fine-tuning yields generally positive gains, with the strongest benefits for smaller models in low-resource settings, and that multilingual and cross-lingual configurations often outperform monolingual baselines. These results highlight a practical, data-efficient pathway to enhance linguistic competence in pre-trained models and motivate further exploration of additional features and instruction-tuning paradigms.
Abstract
In this paper, we explore the impact of augmenting pre-trained Encoder-Decoder models, specifically T5, with linguistic knowledge for the prediction of a target task. In particular, we investigate whether fine-tuning a T5 model on an intermediate task that predicts structural linguistic properties of sentences modifies its performance in the target task of predicting sentence-level complexity. Our study encompasses diverse experiments conducted on Italian and English datasets, employing both monolingual and multilingual T5 models at various sizes. Results obtained for both languages and in cross-lingual configurations show that linguistically motivated intermediate fine-tuning has generally a positive impact on target task performance, especially when applied to smaller models and in scenarios with limited data availability.
