Table of Contents
Fetching ...

LogSigma at SemEval-2026 Task 3: Uncertainty-Weighted Multitask Learning for Dimensional Aspect-Based Sentiment Analysis

Baraa Hikal, Jonas Becker, Bela Gipp

Abstract

This paper describes LogSigma, our system for SemEval-2026 Task 3: Dimensional Aspect-Based Sentiment Analysis (DimABSA). Unlike traditional Aspect-Based Sentiment Analysis (ABSA), which predicts discrete sentiment labels, DimABSA requires predicting continuous Valence and Arousal (VA) scores on a 1-9 scale. A central challenge is that Valence and Arousal differ in prediction difficulty across languages and domains. We address this using learned homoscedastic uncertainty, where the model learns task-specific log-variance parameters to automatically balance each regression objective during training. Combined with language-specific encoders and multi-seed ensembling, LogSigma achieves 1st place on five datasets across both tracks. The learned variance weights vary substantially across languages due to differing Valence-Arousal difficulty profiles-from 0.66x for German to 2.18x for English-demonstrating that optimal task balancing is language-dependent and cannot be determined a priori.

LogSigma at SemEval-2026 Task 3: Uncertainty-Weighted Multitask Learning for Dimensional Aspect-Based Sentiment Analysis

Abstract

This paper describes LogSigma, our system for SemEval-2026 Task 3: Dimensional Aspect-Based Sentiment Analysis (DimABSA). Unlike traditional Aspect-Based Sentiment Analysis (ABSA), which predicts discrete sentiment labels, DimABSA requires predicting continuous Valence and Arousal (VA) scores on a 1-9 scale. A central challenge is that Valence and Arousal differ in prediction difficulty across languages and domains. We address this using learned homoscedastic uncertainty, where the model learns task-specific log-variance parameters to automatically balance each regression objective during training. Combined with language-specific encoders and multi-seed ensembling, LogSigma achieves 1st place on five datasets across both tracks. The learned variance weights vary substantially across languages due to differing Valence-Arousal difficulty profiles-from 0.66x for German to 2.18x for English-demonstrating that optimal task balancing is language-dependent and cannot be determined a priori.

Paper Structure

This paper contains 54 sections, 6 equations, 7 figures, 22 tables.

Figures (7)

  • Figure 1: Overview of LogSigma. Left: Input text and aspect are encoded as [CLS] text [SEP] aspect [SEP] and passed through a language-specific pretrained encoder to obtain $\mathbf{h} \in \mathbb{R}^d$. Middle: Dual fully-connected heads predict Valence and Arousal independently; during training (dashed region), learnable log-variance parameters $s = \log \sigma^2$ balance the MSE losses via precision weighting. Right: At inference, predictions from three seeds are averaged to produce the final V.VV#A.AA output.
  • Figure 2: Learned task variance ($\sigma^2$) across Track B languages (seed 42). Higher $\sigma^2$ indicates greater task-specific noise, causing the model to down-weight that objective. English and Pidgin show $\sigma^2_V > \sigma^2_A$ (arousal receives higher weight), while German shows the opposite ($\sigma^2_A > \sigma^2_V$). English and Pidgin learn near-identical values despite per-language training, confirming their shared V/A difficulty profile.
  • Figure 3: English--Pidgin $\sigma^2$ convergence across all three seeds. Despite per-language training on separate datasets, English and Pidgin learn near-identical variance parameters on every seed ($\Delta \leq 0.001$). This quantitatively confirms that the Twitter-RoBERTa encoder perceives these two languages as having identical V/A difficulty profiles, consistent with Nigerian Pidgin being an English-based creole.
  • Figure 4: Arousal PCC per encoder across Track B languages. XLM-RoBERTa-Large and Afro-XLMR-Large achieve the highest arousal correlations on most datasets. mDeBERTa and XLM-Twitter-Politics fall below the near-zero threshold (dashed line) on German, demonstrating that their pretraining does not transfer to arousal regression. Pidgin is the only language where all encoders achieve reasonable arousal PCC ($>$0.29), likely because its English-based lexicon benefits all English-pretrained models.
  • Figure 5: Nigerian Pidgin model progression (RMSE on dev set). Switching from multilingual XLM-R (0.98) to English-pretrained Twitter-RoBERTa yields $-$21.4% RMSE. Adding 3-seed ensembling and learned uncertainty yields a further $-$19.4% (0.77 $\to$ 0.62), demonstrating that encoder selection, ensembling, and loss design each contribute substantially.
  • ...and 2 more figures