Table of Contents
Fetching ...

nchellwig at SemEval-2026 Task 3: Self-Consistent Structured Generation (SCSG) for Dimensional Aspect-Based Sentiment Analysis using Large Language Models

Nils Constantin Hellwig, Jakob Fehle, Udo Kruschwitz, Christian Wolff

TL;DR

Evaluation across 6 languages and 8 language--domain combinations demonstrates that self-consistency with 15 executions yields statistically significant improvements over single-inference prompting, with the SCSG system ranking in the top seven across all settings.

Abstract

We present Self-Consistent Structured Generation (SCSG) for Dimensional Aspect-Based Sentiment Analysis in SemEval-2026 Task 3 (Track A). SCSG enhances prediction reliability by executing a LoRA-adapted large language model multiple times per instance, retaining only tuples that achieve a majority consensus across runs. To mitigate the computational overhead of multiple forward passes, we leverage vLLM's PagedAttention mechanism for efficient key--value cache reuse. Evaluation across 6 languages and 8 language--domain combinations demonstrates that self-consistency with 15 executions yields statistically significant improvements over single-inference prompting, with our system (leveraging Gemma 3) ranking in the top seven across all settings, achieving second place on three out of four English subsets and first place on Tatar-Restaurant for DimASTE.

nchellwig at SemEval-2026 Task 3: Self-Consistent Structured Generation (SCSG) for Dimensional Aspect-Based Sentiment Analysis using Large Language Models

TL;DR

Evaluation across 6 languages and 8 language--domain combinations demonstrates that self-consistency with 15 executions yields statistically significant improvements over single-inference prompting, with the SCSG system ranking in the top seven across all settings.

Abstract

We present Self-Consistent Structured Generation (SCSG) for Dimensional Aspect-Based Sentiment Analysis in SemEval-2026 Task 3 (Track A). SCSG enhances prediction reliability by executing a LoRA-adapted large language model multiple times per instance, retaining only tuples that achieve a majority consensus across runs. To mitigate the computational overhead of multiple forward passes, we leverage vLLM's PagedAttention mechanism for efficient key--value cache reuse. Evaluation across 6 languages and 8 language--domain combinations demonstrates that self-consistency with 15 executions yields statistically significant improvements over single-inference prompting, with our system (leveraging Gemma 3) ranking in the top seven across all settings, achieving second place on three out of four English subsets and first place on Tatar-Restaurant for DimASTE.
Paper Structure (26 sections, 1 equation, 2 figures, 7 tables)

This paper contains 26 sections, 1 equation, 2 figures, 7 tables.

Figures (2)

  • Figure 1: Prompt used for SCSG. The prompt comprises descriptions of the considered sentiment elements (4 for DimASTE, 5 for DimASQP), explanations regarding the range of valence and arousal, the desired output format, and the example text for which ABSA is to be performed.
  • Figure 2: Self-consistency majority voting for DimASTE over $k=5$ runs. Aspect-sentiment pairs (ignoring valence-arousal values) appearing in $\geq \tau = \lceil k/2 \rceil$ runs are aggregated by averaging their valence and arousal values. The aggregation section shows the explicit calculation. Light blue rows highlight matching (Decor, nice) variants; light green rows highlight matching (service, spotty) variants. For DimASQP, in addition to the aspect term and sentiment polarity, the aspect category is considered as well.