Supervised In-Context Fine-Tuning for Generative Sequence Labeling

David Dukić; Goran Glavaš; Jan Šnajder

Supervised In-Context Fine-Tuning for Generative Sequence Labeling

David Dukić, Goran Glavaš, Jan Šnajder

TL;DR

The paper tackles sequence labeling with decoder-based LLMs by introducing SIFT, a framework that unites supervised fine-tuning with in-context demonstrations under a generative, response-focused objective. By comparing vanilla CLM, SRC, and MRC strategies across four SL tasks and five LLMs, the authors show that multi-response completion (MRC) and dense demonstrations yield substantial gains over traditional ICL and decoder-as-encoder baselines. They also reveal that long-context settings can be mitigated by omitting the instruction, suggesting a practical preference for instruction-free prompts in many SL scenarios. The findings underscore the potential of response-based, generative task formulations for robust SL with LLMs and highlight concrete best practices for SIFT in real-world NL tasks.

Abstract

Sequence labeling (SL) tasks, where labels are assigned to tokens, are abundant in NLP (e.g., named entity recognition and aspect-based sentiment analysis). Owing to the intuition that they require bidirectional context, SL tasks are commonly tackled with encoder-only models. Recent work also shows that removing the causal mask in fine-tuning enables decoder-based LLMs to become effective token classifiers. Less work, however, focused on (supervised) generative SL, a more natural setting for causal LLMs. Due to their rapid scaling, causal LLMs applied to SL are expected to outperform encoders, whose own development has stagnated. In this work, we propose supervised in-context fine-tuning (SIFT) for generative SL. SIFT casts SL tasks as constrained response generation, natural to LLMs, combining in-context learning (ICL) from demonstrations with supervised fine-tuning. SIFT considerably outperforms both ICL and decoder-as-encoder fine-tuning baselines on a range of standard SL tasks. We further find that although long context hinders the performance of generative SL in both ICL and SIFT, this deficiency can be mitigated by removing the instruction, as instructions are shown to be largely unnecessary for achieving strong SL performance with SIFT. Our findings highlight strengths and limitations of SL with LLMs, underscoring the importance of a response-based generative task formulation for effective SL performance.

Supervised In-Context Fine-Tuning for Generative Sequence Labeling

TL;DR

Abstract

Supervised In-Context Fine-Tuning for Generative Sequence Labeling

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)