Table of Contents
Fetching ...

LPI-RIT at LeWiDi-2025: Improving Distributional Predictions via Metadata and Loss Reweighting with DisCo

Mandira Sawkar, Samay U. Shetty, Deepak Pandita, Tharindu Cyril Weerasooriya, Christopher M. Homan

TL;DR

This work tackles the challenge of predicting annotator disagreement in LeWiDi-2025 by extending the DisCo framework to incorporate annotator metadata and task-aligned multi-objective losses. The proposed DisCo_New architecture uses metadata embeddings and task-specific encoders to jointly model item-, annotator-, and annotator-level distributions, optimizing losses that align with soft-label and perspectivist evaluations, including Wasserstein distance and MAE components. Across CSC, MultiPico, and Par, DisCo_New yields consistent improvements in both soft-label distributions and perspectivist predictions, with calibration analyses showing more reliable uncertainty handling and better annotator-specific alignment. These findings demonstrate that leveraging demographic and contextual annotator information, together with loss objectives tied to evaluation metrics, enhances the ability to capture human disagreement in complex linguistic tasks, with implications for robustness and fairness in real-world systems.

Abstract

The Learning With Disagreements (LeWiDi) 2025 shared task aims to model annotator disagreement through soft label distribution prediction and perspectivist evaluation, which focuses on modeling individual annotators. We adapt DisCo (Distribution from Context), a neural architecture that jointly models item-level and annotator-level label distributions, and present detailed analysis and improvements. In this paper, we extend DisCo by introducing annotator metadata embeddings, enhancing input representations, and multi-objective training losses to capture disagreement patterns better. Through extensive experiments, we demonstrate substantial improvements in both soft and perspectivist evaluation metrics across three datasets. We also conduct in-depth calibration and error analyses that reveal when and why disagreement-aware modeling improves. Our findings show that disagreement can be better captured by conditioning on annotator demographics and by optimizing directly for distributional metrics, yielding consistent improvements across datasets.

LPI-RIT at LeWiDi-2025: Improving Distributional Predictions via Metadata and Loss Reweighting with DisCo

TL;DR

This work tackles the challenge of predicting annotator disagreement in LeWiDi-2025 by extending the DisCo framework to incorporate annotator metadata and task-aligned multi-objective losses. The proposed DisCo_New architecture uses metadata embeddings and task-specific encoders to jointly model item-, annotator-, and annotator-level distributions, optimizing losses that align with soft-label and perspectivist evaluations, including Wasserstein distance and MAE components. Across CSC, MultiPico, and Par, DisCo_New yields consistent improvements in both soft-label distributions and perspectivist predictions, with calibration analyses showing more reliable uncertainty handling and better annotator-specific alignment. These findings demonstrate that leveraging demographic and contextual annotator information, together with loss objectives tied to evaluation metrics, enhances the ability to capture human disagreement in complex linguistic tasks, with implications for robustness and fairness in real-world systems.

Abstract

The Learning With Disagreements (LeWiDi) 2025 shared task aims to model annotator disagreement through soft label distribution prediction and perspectivist evaluation, which focuses on modeling individual annotators. We adapt DisCo (Distribution from Context), a neural architecture that jointly models item-level and annotator-level label distributions, and present detailed analysis and improvements. In this paper, we extend DisCo by introducing annotator metadata embeddings, enhancing input representations, and multi-objective training losses to capture disagreement patterns better. Through extensive experiments, we demonstrate substantial improvements in both soft and perspectivist evaluation metrics across three datasets. We also conduct in-depth calibration and error analyses that reveal when and why disagreement-aware modeling improves. Our findings show that disagreement can be better captured by conditioning on annotator demographics and by optimizing directly for distributional metrics, yielding consistent improvements across datasets.

Paper Structure

This paper contains 38 sections, 2 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: Data representation for DisCo: each item $\mathbf{x}_m$ is paired with per‐annotator responses $\mathbf{y}_{\cdot,m}$ and their empirical distribution $\#\mathbf{y}_{\cdot,m}$, and each annotator $n$ has a response vector $\mathbf{y}_{n,\cdot}$ with distribution $\#\mathbf{y}_{n,\cdot}$.
  • Figure 2: Block diagram of the DisCo encoder and decoder. The encoder maps item and annotator inputs into a joint latent code $\mathbf{z}_E$, and the decoder produces three parallel distributions via softmax heads.
  • Figure 3: Metadata Embedding Pipeline for DisCo_New: After converting raw metadata into Natural language, it is passed through a transformer to generate embeddings and eventually generate $a_n$
  • Figure 4: Soft-label confusion matrix for MP dev set (DisCo_New). Improved recall for the Ironic class is shown compared to DisCo_OG.
  • Figure 5: Prediction error vs. modal label probability for the MP dev set. Fewer high-error outliers at high confidence are seen for DisCo_New.
  • ...and 7 more figures