LPI-RIT at LeWiDi-2025: Improving Distributional Predictions via Metadata and Loss Reweighting with DisCo
Mandira Sawkar, Samay U. Shetty, Deepak Pandita, Tharindu Cyril Weerasooriya, Christopher M. Homan
TL;DR
This work tackles the challenge of predicting annotator disagreement in LeWiDi-2025 by extending the DisCo framework to incorporate annotator metadata and task-aligned multi-objective losses. The proposed DisCo_New architecture uses metadata embeddings and task-specific encoders to jointly model item-, annotator-, and annotator-level distributions, optimizing losses that align with soft-label and perspectivist evaluations, including Wasserstein distance and MAE components. Across CSC, MultiPico, and Par, DisCo_New yields consistent improvements in both soft-label distributions and perspectivist predictions, with calibration analyses showing more reliable uncertainty handling and better annotator-specific alignment. These findings demonstrate that leveraging demographic and contextual annotator information, together with loss objectives tied to evaluation metrics, enhances the ability to capture human disagreement in complex linguistic tasks, with implications for robustness and fairness in real-world systems.
Abstract
The Learning With Disagreements (LeWiDi) 2025 shared task aims to model annotator disagreement through soft label distribution prediction and perspectivist evaluation, which focuses on modeling individual annotators. We adapt DisCo (Distribution from Context), a neural architecture that jointly models item-level and annotator-level label distributions, and present detailed analysis and improvements. In this paper, we extend DisCo by introducing annotator metadata embeddings, enhancing input representations, and multi-objective training losses to capture disagreement patterns better. Through extensive experiments, we demonstrate substantial improvements in both soft and perspectivist evaluation metrics across three datasets. We also conduct in-depth calibration and error analyses that reveal when and why disagreement-aware modeling improves. Our findings show that disagreement can be better captured by conditioning on annotator demographics and by optimizing directly for distributional metrics, yielding consistent improvements across datasets.
