Prompting and Fine-Tuning Open-Sourced Large Language Models for Stance Classification

Iain J. Cruickshank; Lynnette Hui Xian Ng

Prompting and Fine-Tuning Open-Sourced Large Language Models for Stance Classification

Iain J. Cruickshank, Lynnette Hui Xian Ng

TL;DR

It is discovered that LLMs do not routinely outperform their smaller supervised machine learning models, and thus it is called for stance detection to be a benchmark for which LLMs also optimize for.

Abstract

Stance classification, the task of predicting the viewpoint of an author on a subject of interest, has long been a focal point of research in domains ranging from social science to machine learning. Current stance detection methods rely predominantly on manual annotation of sentences, followed by training a supervised machine learning model. However, this manual annotation process requires laborious annotation effort, and thus hampers its potential to generalize across different contexts. In this work, we investigate the use of Large Language Models (LLMs) as a stance detection methodology that can reduce or even eliminate the need for manual annotations. We investigate 10 open-source models and 7 prompting schemes, finding that LLMs are competitive with in-domain supervised models but are not necessarily consistent in their performance. We also fine-tuned the LLMs, but discovered that fine-tuning process does not necessarily lead to better performance. In general, we discover that LLMs do not routinely outperform their smaller supervised machine learning models, and thus call for stance detection to be a benchmark for which LLMs also optimize for. The code used in this study is available at \url{https://github.com/ijcruic/LLM-Stance-Labeling}

Prompting and Fine-Tuning Open-Sourced Large Language Models for Stance Classification

TL;DR

It is discovered that LLMs do not routinely outperform their smaller supervised machine learning models, and thus it is called for stance detection to be a benchmark for which LLMs also optimize for.

Abstract

Paper Structure (26 sections, 4 figures, 6 tables)

This paper contains 26 sections, 4 figures, 6 tables.

Introduction
Objectives and Contributions
Related Work
Stance Classification Task
The use of Large Language Models for Stance Classification
Stance Classification for Social Media
Data Sets
Methodology
Large Language Models Used
Prompting Schemes for Stance Classification
Fine-Tuning an LLM for Stance Classification
Evaluation Metrics
Results
Results of LLM & Prompting Schemes Combinations
Results of Fine-Tuning LLMs
...and 11 more sections

Figures (4)

Figure 1: Overarching Prompting Scheme for Stance Classification Text highlights indicate the information available for each of the prompting schemes. Black text indicates the basic task-only prompt, whose elements are common to all of the other prompts. Blue indicates the addition of the task definition. Yellow indicates the addition of context, whereas the orange color is the context question variation of the context prompt. Green indicates the prompt additions from few-shot prompting. Finally, red indicates the additional reasoning or framing prompts introduced in prompting methods like CoT and CoDA that attempt to elicit preliminary reasoning about the statement before trying to classify the stance of the statement. Those words bracketed in square brackets are dataset dependent, while those bracketed in curly braces are instance-dependent.
Figure 2: Graphical Representation of all results by F1 score accuracy
Figure 3: Graphical presentation of the proportion of ooutputs that returns a stance prediction
Figure 4: Proportion of entries that the LLM correctly predicted the stance, given that it had a valid stance.

Prompting and Fine-Tuning Open-Sourced Large Language Models for Stance Classification

TL;DR

Abstract

Prompting and Fine-Tuning Open-Sourced Large Language Models for Stance Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (4)