SPARSEFIT: Few-shot Prompting with Sparse Fine-tuning for Jointly Generating Predictions and Natural Language Explanations

Jesus Solano; Mardhiyah Sanni; Oana-Maria Camburu; Pasquale Minervini

SPARSEFIT: Few-shot Prompting with Sparse Fine-tuning for Jointly Generating Predictions and Natural Language Explanations

Jesus Solano, Mardhiyah Sanni, Oana-Maria Camburu, Pasquale Minervini

TL;DR

SparseFit introduces a sparse fine-tuning paradigm combined with prompt-based conditioning to enable joint generation of predictions and natural language explanations in few-shot settings. By selectively updating small fractions of parameters across encoder, decoder, LM head, attention, and normalization components, SparseFit achieves competitive task performance and NLE quality across multiple T5 variants and four NLE datasets, often surpassing full fine-tuning and other PEFT baselines. The approach scales to larger models (e.g., Llama 2-7B) with consistent gains in NLE quality, though some configurations still produce empty or low-quality explanations due to pretraining and limited fine-tuning. Overall, SparseFit demonstrates that updating as little as ~6.8% of a model can approach, and in some cases exceed, the performance of fully fine-tuned models, offering a practical, efficient path for building explainable NLP systems in data-constrained regimes.

Abstract

Models that generate natural language explanations (NLEs) for their predictions have recently gained increasing interest. However, this approach usually demands large datasets of human-written NLEs for the ground-truth answers at training time, which can be expensive and potentially infeasible for some applications. When only a few NLEs are available (a few-shot setup), fine-tuning pre-trained language models (PLMs) in conjunction with prompt-based learning has recently shown promising results. However, PLMs typically have billions of parameters, making full fine-tuning expensive. We propose SparseFit, a sparse few-shot fine-tuning strategy that leverages discrete prompts to jointly generate predictions and NLEs. We experiment with SparseFit on three sizes of the T5 language model and four datasets and compare it against existing state-of-the-art Parameter-Efficient Fine-Tuning (PEFT) techniques. We find that fine-tuning only 6.8% of the model parameters leads to competitive results for both the task performance and the quality of the generated NLEs compared to full fine-tuning of the model and produces better results on average than other PEFT methods in terms of predictive accuracy and NLE quality.

SPARSEFIT: Few-shot Prompting with Sparse Fine-tuning for Jointly Generating Predictions and Natural Language Explanations

TL;DR

Abstract

Paper Structure (36 sections, 31 figures, 8 tables)

This paper contains 36 sections, 31 figures, 8 tables.

Introduction
Related Work
Parameter-Efficient Fine-Tuning
Explainability of Neural Models
SparseFit
Encoder
Decoder
LM Head
Attention Layer
Layer Normalization
Experiments
Datasets
Few-shot Learning Data Splits
Training Procedure
Automatic Evaluation
...and 21 more sections

Figures (31)

Figure 1: Distribution of the normalized BERTScore for different SparseFit settings of sparse fine-tuning for T5-large. The percentage of fine-tuned parameters is shown between brackets.
Figure 2: Illustration of plausibility score given by human annotators to the quality of the NLEs generated by different SparseFit configurations. The annotators were asked to answer the question: "Does the explanation justify the answer?
Figure 3: Histogram of the shortcomings of the generated NLEs for the baseline and the performing SparseFit configurations aggregated for all the datasets.
Figure 4: Examples of generated NLEs for e-SNLI (Green), ECQA (Blue), SBIC (Red), and ComVE (Yellow).
Figure 5: Illustration of the active trainable parameters in T5 when SparseFit is performed over the layer normalization.
...and 26 more figures

SPARSEFIT: Few-shot Prompting with Sparse Fine-tuning for Jointly Generating Predictions and Natural Language Explanations

TL;DR

Abstract

SPARSEFIT: Few-shot Prompting with Sparse Fine-tuning for Jointly Generating Predictions and Natural Language Explanations

Authors

TL;DR

Abstract

Table of Contents

Figures (31)