Breaking Free Transformer Models: Task-specific Context Attribution Promises Improved Generalizability Without Fine-tuning Pre-trained LLMs

Stepan Tytarenko; Mohammad Ruhul Amin

Breaking Free Transformer Models: Task-specific Context Attribution Promises Improved Generalizability Without Fine-tuning Pre-trained LLMs

Stepan Tytarenko, Mohammad Ruhul Amin

TL;DR

The paper tackles the problem of degraded generalizability from task-specific fine-tuning of large pre-trained language models by introducing a Space Model that performs context attribution: transforming contextual embeddings via a trainable concept operator into class-specific latent concept spaces, with final decisions made from concatenated centroids. The method defines $C^i = \tanh(E P^i)$ and uses class centroids $k_i = \frac{1}{N_s} \sum_j c_{i,j}$ in a centroid-based classifier, while a cross-entropy loss is augmented by an intra-space loss to enforce disjoint concept representations. Empirically, the Space Model yields substantial improvements across HateXplain, IMDB, and Social Media Attributions, including up to 8% higher accuracy and 10% higher F1 without full fine-tuning, and strong zero-shot/generalizability gains (e.g., 7% F1 in cross-dataset tests). The approach also reduces required trainable parameters and stabilizes training, showing practical impact for robust NLP classification across diverse base models and domains, with an open-source PyTorch implementation provided.

Abstract

Fine-tuning large pre-trained language models (LLMs) on particular datasets is a commonly employed strategy in Natural Language Processing (NLP) classification tasks. However, this approach usually results in a loss of models generalizability. In this paper, we present a framework that allows for maintaining generalizability, and enhances the performance on the downstream task by utilizing task-specific context attribution. We show that a linear transformation of the text representation from any transformer model using the task-specific concept operator results in a projection onto the latent concept space, referred to as context attribution in this paper. The specific concept operator is optimized during the supervised learning stage via novel loss functions. The proposed framework demonstrates that context attribution of the text representation for each task objective can improve the capacity of the discriminator function and thus achieve better performance for the classification task. Experimental results on three datasets, namely HateXplain, IMDB reviews, and Social Media Attributions, illustrate that the proposed model attains superior accuracy and generalizability. Specifically, for the non-fine-tuned BERT on the HateXplain dataset, we observe 8% improvement in accuracy and 10% improvement in F1-score. Whereas for the IMDB dataset, fine-tuned state-of-the-art XLNet is outperformed by 1% for both accuracy and F1-score. Furthermore, in an out-of-domain cross-dataset test, DistilBERT fine-tuned on the IMDB dataset in conjunction with the proposed model improves the F1-score on the HateXplain dataset by 7%. For the Social Media Attributions dataset of YouTube comments, we observe 5.2% increase in F1-metric. The proposed framework is implemented with PyTorch and provided open-source on GitHub.

Breaking Free Transformer Models: Task-specific Context Attribution Promises Improved Generalizability Without Fine-tuning Pre-trained LLMs

TL;DR

and uses class centroids

in a centroid-based classifier, while a cross-entropy loss is augmented by an intra-space loss to enforce disjoint concept representations. Empirically, the Space Model yields substantial improvements across HateXplain, IMDB, and Social Media Attributions, including up to 8% higher accuracy and 10% higher F1 without full fine-tuning, and strong zero-shot/generalizability gains (e.g., 7% F1 in cross-dataset tests). The approach also reduces required trainable parameters and stabilizes training, showing practical impact for robust NLP classification across diverse base models and domains, with an open-source PyTorch implementation provided.

Abstract

Paper Structure (17 sections, 5 equations, 3 figures, 5 tables)

This paper contains 17 sections, 5 equations, 3 figures, 5 tables.

Introduction
Related work
Methodology
Context Attribution
Contextual word embeddings
Conceptual projections
Classification
Loss function
Results
Preprocessing and settings
Evaluation Metrics
Experimental Results
Fine-tuning Space Model
Generalizability
Social Media Attribution
...and 2 more sections

Figures (3)

Figure 1: 3D projection of the space embeddings for the 2-class classification. After projecting the sentence onto different concept spaces, we expect these projections to be orthogonal if the classes are completely divergent. For the case between positive and negative sentiment, we expect that positive class projection would be orthogonal to the negative class projection.
Figure 2: 3D projection of the space embeddings for the 3-class classification (HateXplain). For the 3-class, similar to the 2-class, we expect to have 3 orthogonal projections. Here, we observe that if we review this image in multiple projections - some projections are clearly orthogonal, and some are more aligned. This is the effect that we have discussed previously, that contextual attributions might have overlapping concepts.
Figure 3: DistilBERT (upper part) vs DistilBERT and Space Model (lower part) stabilization comparison

Breaking Free Transformer Models: Task-specific Context Attribution Promises Improved Generalizability Without Fine-tuning Pre-trained LLMs

TL;DR

Abstract

Breaking Free Transformer Models: Task-specific Context Attribution Promises Improved Generalizability Without Fine-tuning Pre-trained LLMs

Authors

TL;DR

Abstract

Table of Contents

Figures (3)