JuniperLiu at CoMeDi Shared Task: Models as Annotators in Lexical Semantics Disagreements
Zhu Liu, Zhen Hu, Ying Liu
TL;DR
The paper tackles disagreements in lexical semantics by framing Subtask 1 as estimating the mean $\mu$ of a judgment distribution and Subtask 2 as estimating the variance $\sigma^2$, treating each system as a virtual annotator. It combines threshold-based labeling with anisotropy removal and an MLP-based regressor for disagreement, leveraging model ensembling to capture annotator diversity. Results show that anisotropy removal and high-layer representations boost Subtask 1, while STD-based scores on continuous relatedness correlate with human disagreement for Subtask 2, with language-specific ensembling providing additional gains. The work offers a practical framework for simulating annotator diversity in multilingual lexical semantics and provides code at https://github.com/RyanLiut/CoMeDi_Solution.
Abstract
We present the results of our system for the CoMeDi Shared Task, which predicts majority votes (Subtask 1) and annotator disagreements (Subtask 2). Our approach combines model ensemble strategies with MLP-based and threshold-based methods trained on pretrained language models. Treating individual models as virtual annotators, we simulate the annotation process by designing aggregation measures that incorporate continuous relatedness scores and discrete classification labels to capture both majority and disagreement. Additionally, we employ anisotropy removal techniques to enhance performance. Experimental results demonstrate the effectiveness of our methods, particularly for Subtask 2. Notably, we find that standard deviation on continuous relatedness scores among different model manipulations correlates with human disagreement annotations compared to metrics on aggregated discrete labels. The code will be published at https://github.com/RyanLiut/CoMeDi_Solution.
