Table of Contents
Fetching ...

Linguistic Knowledge Can Enhance Encoder-Decoder Models (If You Let It)

Alessio Miaschi, Felice Dell'Orletta, Giulia Venturi

TL;DR

The paper addresses whether incorporating linguistic knowledge via intermediate fine-tuning can boost encoder-decoder models in a target task of predicting sentence complexity. It implements a two-step STILTs pipeline where T5 is first fine-tuned on multi-task linguistic properties derived from ProfilingUD, then fine-tuned on the complexity target, across Italian and English using mono- and multilingual variants and varying data sizes. The main findings show that linguistically informed intermediate fine-tuning yields generally positive gains, with the strongest benefits for smaller models in low-resource settings, and that multilingual and cross-lingual configurations often outperform monolingual baselines. These results highlight a practical, data-efficient pathway to enhance linguistic competence in pre-trained models and motivate further exploration of additional features and instruction-tuning paradigms.

Abstract

In this paper, we explore the impact of augmenting pre-trained Encoder-Decoder models, specifically T5, with linguistic knowledge for the prediction of a target task. In particular, we investigate whether fine-tuning a T5 model on an intermediate task that predicts structural linguistic properties of sentences modifies its performance in the target task of predicting sentence-level complexity. Our study encompasses diverse experiments conducted on Italian and English datasets, employing both monolingual and multilingual T5 models at various sizes. Results obtained for both languages and in cross-lingual configurations show that linguistically motivated intermediate fine-tuning has generally a positive impact on target task performance, especially when applied to smaller models and in scenarios with limited data availability.

Linguistic Knowledge Can Enhance Encoder-Decoder Models (If You Let It)

TL;DR

The paper addresses whether incorporating linguistic knowledge via intermediate fine-tuning can boost encoder-decoder models in a target task of predicting sentence complexity. It implements a two-step STILTs pipeline where T5 is first fine-tuned on multi-task linguistic properties derived from ProfilingUD, then fine-tuned on the complexity target, across Italian and English using mono- and multilingual variants and varying data sizes. The main findings show that linguistically informed intermediate fine-tuning yields generally positive gains, with the strongest benefits for smaller models in low-resource settings, and that multilingual and cross-lingual configurations often outperform monolingual baselines. These results highlight a practical, data-efficient pathway to enhance linguistic competence in pre-trained models and motivate further exploration of additional features and instruction-tuning paradigms.

Abstract

In this paper, we explore the impact of augmenting pre-trained Encoder-Decoder models, specifically T5, with linguistic knowledge for the prediction of a target task. In particular, we investigate whether fine-tuning a T5 model on an intermediate task that predicts structural linguistic properties of sentences modifies its performance in the target task of predicting sentence-level complexity. Our study encompasses diverse experiments conducted on Italian and English datasets, employing both monolingual and multilingual T5 models at various sizes. Results obtained for both languages and in cross-lingual configurations show that linguistically motivated intermediate fine-tuning has generally a positive impact on target task performance, especially when applied to smaller models and in scenarios with limited data availability.
Paper Structure (25 sections, 8 figures, 5 tables)

This paper contains 25 sections, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Illustrated example of the proposed methodology. T5 is previously fine-tuned on a subset of linguistic intermediate tasks in a multitask fashion. Then, the newly obtained model, LiT5, is tested on the target task.
  • Figure 2: Spearman correlation coefficients for the intermediate tasks for the Italian (top) and English (bottom) datasets obtained with the monolingual T5 models. Each column in the heatmaps contains the results obtained by the models fine-tuned for multiple epochs (e.g. 5 = 5 epochs of fine-tuning).
  • Figure 3: Spearman correlation coefficients for the intermediate tasks for the Italian (top) and English (bottom) datasets obtained with the multilingual T5 models.
  • Figure 4: Spearman correlation coefficients for the target tasks for the Italian (top) and English (bottom) datasets obtained with the monolingual models using pre-trained and LiT5 models.
  • Figure 5: Spearman correlation coefficients for the target tasks for the Italian (top) and English (bottom) datasets obtained with the mt5-*. Intermediate scores are reported for the models fine-tuned for Italian and English for the minimum (5) and maximum (25) number of epochs.
  • ...and 3 more figures