Contrast Is All You Need

Burak Kilic; Florix Bex; Albert Gatt

Contrast Is All You Need

Burak Kilic, Florix Bex, Albert Gatt

TL;DR

This work tackles data-scarce, imbalanced legal text classification by comparing SetFit, a contrastive finetuning approach, to vanilla finetuning on the LEDGAR dataset. It employs a two-stage SetFit process (contrastive fine-tuning of a Sentence Transformer followed by a logistic head) and analyzes decision rationales with LIME to assess reliance on legally informative features. Results show SetFit yields better or comparable performance with far fewer labeled samples, and LIME reveals that the contrastive objective emphasizes legally relevant features more strongly than standard finetuning. The findings support using contrastive, data-efficient methods for legal NLP tasks and highlight the value of explainability analyses in assessing model trustworthiness and feature use.

Abstract

In this study, we analyze data-scarce classification scenarios, where available labeled legal data is small and imbalanced, potentially hurting the quality of the results. We focused on two finetuning objectives; SetFit (Sentence Transformer Finetuning), a contrastive learning setup, and a vanilla finetuning setup on a legal provision classification task. Additionally, we compare the features that are extracted with LIME (Local Interpretable Model-agnostic Explanations) to see which particular features contributed to the model's classification decisions. The results show that a contrastive setup with SetFit performed better than vanilla finetuning while using a fraction of the training samples. LIME results show that the contrastive learning approach helps boost both positive and negative features which are legally informative and contribute to the classification results. Thus a model finetuned with a contrastive objective seems to base its decisions more confidently on legally informative features.

Contrast Is All You Need

TL;DR

Abstract

Paper Structure (17 sections, 1 equation, 12 figures, 3 tables)

This paper contains 17 sections, 1 equation, 12 figures, 3 tables.

Introduction
Related Work
SetFit: Sentence Transformer Finetuning
ST finetuning
Classification head training
Inference
Data
Data source
Crawling and balancing
Experiments
Models
Experimental Setup
Results
F1-score comparisons: Original dataset
Accuracy comparisons: Original and balanced dataset
...and 2 more sections

Figures (12)

Figure 1: Accuracy comparison between SetFit and Vanilla finetuning, original LEDGAR dataset
Figure 2: Accuracy comparison between SetFit and Vanilla finetuning, balanced LEDGAR dataset
Figure 3: SetFit vs Vanilla finetuning, common positive LIME features comparison for Adjustments provision
Figure 4: SetFit finetuning positive LIME features for Adjustments provision
Figure 5: Vanilla finetuning positive LIME features for Adjustments provision
...and 7 more figures

Contrast Is All You Need

TL;DR

Abstract

Contrast Is All You Need

Authors

TL;DR

Abstract

Table of Contents

Figures (12)