Table of Contents
Fetching ...

Data-Efficient Adaptation and a Novel Evaluation Method for Aspect-based Sentiment Analysis

Yan Cathy Hua, Paul Denny, Jörg Wicker, Katerina Taškova

TL;DR

This work addresses the data-efficiency and evaluation gaps in aspect-based sentiment analysis (ABSA) for high-demand, low-resource domains by introducing FTS-OBP, a flexible evaluation framework that tolerates extraction boundary differences while preserving exactness for classification. It systematically studies small decoder-only language models (SLMs) with data-free in-context learning, data-light multitask LoRA-SFT, and weight merging (SLERP) on EduRABSA and benchmark ABSA datasets, revealing that 0-shot or few-shot prompting with large GLMs and multitask LoRA-SFT with as few as 200–1000 labeled examples can rival or surpass larger models, with weight merging providing additional gains. The EduRABSA resource release further supports research in education-domain ABSA. Overall, the results demonstrate that resource-efficient adaptation of SLMs, together with a new evaluation paradigm, enables effective ABSA in low-resource domains and offers practical pathways for education and healthcare applications.

Abstract

Aspect-based Sentiment Analysis (ABSA) is a fine-grained opinion mining approach that identifies and classifies opinions associated with specific entities (aspects) or their categories within a sentence. Despite its rapid growth and broad potential, ABSA research and resources remain concentrated in commercial domains, leaving analytical needs unmet in high-demand yet low-resource areas such as education and healthcare. Domain adaptation challenges and most existing methods' reliance on resource-intensive in-training knowledge injection further hinder progress in these areas. Moreover, traditional evaluation methods based on exact matches are overly rigid for ABSA tasks, penalising any boundary variations which may misrepresent the performance of generative models. This work addresses these gaps through three contributions: 1) We propose a novel evaluation method, Flexible Text Similarity Matching and Optimal Bipartite Pairing (FTS-OBP), which accommodates realistic extraction boundary variations while maintaining strong correlation with traditional metrics and offering fine-grained diagnostics. 2) We present the first ABSA study of small decoder-only generative language models (SLMs; <7B parameters), examining resource lower bounds via a case study in education review ABSA. We systematically explore data-free (in-context learning and weight merging) and data-light fine-tuning methods, and propose a multitask fine-tuning strategy that significantly enhances SLM performance, enabling 1.5-3.8 B models to surpass proprietary large models and approach benchmark results with only 200-1,000 examples on a single GPU. 3) We release the first public set of education review ABSA resources to support future research in low-resource domains.

Data-Efficient Adaptation and a Novel Evaluation Method for Aspect-based Sentiment Analysis

TL;DR

This work addresses the data-efficiency and evaluation gaps in aspect-based sentiment analysis (ABSA) for high-demand, low-resource domains by introducing FTS-OBP, a flexible evaluation framework that tolerates extraction boundary differences while preserving exactness for classification. It systematically studies small decoder-only language models (SLMs) with data-free in-context learning, data-light multitask LoRA-SFT, and weight merging (SLERP) on EduRABSA and benchmark ABSA datasets, revealing that 0-shot or few-shot prompting with large GLMs and multitask LoRA-SFT with as few as 200–1000 labeled examples can rival or surpass larger models, with weight merging providing additional gains. The EduRABSA resource release further supports research in education-domain ABSA. Overall, the results demonstrate that resource-efficient adaptation of SLMs, together with a new evaluation paradigm, enables effective ABSA in low-resource domains and offers practical pathways for education and healthcare applications.

Abstract

Aspect-based Sentiment Analysis (ABSA) is a fine-grained opinion mining approach that identifies and classifies opinions associated with specific entities (aspects) or their categories within a sentence. Despite its rapid growth and broad potential, ABSA research and resources remain concentrated in commercial domains, leaving analytical needs unmet in high-demand yet low-resource areas such as education and healthcare. Domain adaptation challenges and most existing methods' reliance on resource-intensive in-training knowledge injection further hinder progress in these areas. Moreover, traditional evaluation methods based on exact matches are overly rigid for ABSA tasks, penalising any boundary variations which may misrepresent the performance of generative models. This work addresses these gaps through three contributions: 1) We propose a novel evaluation method, Flexible Text Similarity Matching and Optimal Bipartite Pairing (FTS-OBP), which accommodates realistic extraction boundary variations while maintaining strong correlation with traditional metrics and offering fine-grained diagnostics. 2) We present the first ABSA study of small decoder-only generative language models (SLMs; <7B parameters), examining resource lower bounds via a case study in education review ABSA. We systematically explore data-free (in-context learning and weight merging) and data-light fine-tuning methods, and propose a multitask fine-tuning strategy that significantly enhances SLM performance, enabling 1.5-3.8 B models to surpass proprietary large models and approach benchmark results with only 200-1,000 examples on a single GPU. 3) We release the first public set of education review ABSA resources to support future research in low-resource domains.

Paper Structure

This paper contains 45 sections, 16 figures, 11 tables.

Figures (16)

  • Figure 1: Example of the FTS-OBP evaluation method on the ASQE task, with one review entry and two ground-truth (gold) vs. three model output (pred) quadruplets (units). FTS matching uses exact matches for category (C) and sentiment (S) labels, and Rouge-L $F_1$ scores with a threshold for aspect (A) and opinion (O) extractions. The "if" table shows how main-category-match partial scores can assist OBP selection.
  • Figure 2: Macro-$F_1$ scores on five ABSA tasks (OE, AOPE, AOC, ASTE, ASQE) for pre-trained GLMs and SLMs, and LoRA-SFT and LoRA weight-merged SLMs on the EduRABSA dataset (300 test examples per task), with 0-shot (0S) and 4-shot (4S) prompt input. $\Delta$ = 4S - 0S score (> 0.15). The LoRA models were fine-tuned and tested with identical prompts. The merged models were based on 4S LoRA checkpoints.
  • Figure D.1: Correlation and differences between macro-$F_1$ scores from FTS-OBP and exact-match-based evaluation ("Exact-match") on outputs from 34 model-prompt pairs across 5 ABSA tasks (OE, AOPE, AOC, ASTE, ASQE) using the EduRABSA test dataset.
  • Figure D.2: Macro-$F_1$ scores computed with FTS-OBP and exact-match-based method ("Exact-match") from 34 model-prompt pairs on 5 ABSA subtasks (OE, AOPE, AOC, ASTE, ASQE) using the EduRABSA test dataset. The models include pre-trained, LoRA SFT ("LoRA_"), and LoRA weight-merged ("Merged_") GLMs and SLMs, with 0-shot (0S) and 4-shot (4S) prompt inputs. The upper section of each bar represents the score difference ($\Delta$ = FTS-OBP $-$ Exact-match, mean($\Delta$) = 0.156).
  • Figure D.3: Macro-$F_1$ scores computed with FTS-OBP and exact-match-based method ("Exact-match") from 34 model-prompt pairs across 5 ABSA subtasks (OE, AOPE, AOC, ASTE, ASQE) using the EduRABSA test dataset. The models include pre-trained GLMs and SLMs, and LoRA SFT ("LoRA_") and LoRA weight-merged ("Merged_") SLMs, with 0-shot (0S) and 4-shot (4S) prompts. The top section of each bar shows the score difference ($\Delta$ = FTS-OBP $-$ Exact-match).
  • ...and 11 more figures