Opt-ICL at LeWiDi-2025: Maximizing In-Context Signal from Rater Examples via Meta-Learning
Taylor Sorensen, Yejin Choi
TL;DR
Opt-ICL combines in-context learning with two-stage meta-learning to model annotator disagreement across LeWiDi tasks. By Spectrum Tuning, dataset-specific training, and careful in-context inference that leverages rater demonstrations, the system achieves strong performance and is reported as the overall winner on both tasks. Key findings show in-context rater examples are crucial, larger datasets benefit from dataset-specific tuning, Spectrum Tuning helps on at least one dataset, and model scale aids performance but cannot replace targeted training. The work advances practical methods for modeling human variation in NLP and informs robust evaluation and calibration under disagreement.
Abstract
Many natural language processing (NLP) tasks involve subjectivity, ambiguity, or legitimate disagreement between annotators. In this paper, we outline our system for modeling human variation. Our system leverages language models' (LLMs) in-context learning abilities, along with a two-step meta-learning training procedure for 1) post-training on many datasets requiring in-context learning and 2) specializing the model via in-context meta-learning to the particular data distribution of interest. We also evaluate the performance of our system submission to the Learning With Disagreements (LeWiDi) competition, where it was the overall winner on both tasks. Additionally, we perform an ablation study to measure the importance of each system component. We find that including rater examples in-context is crucial for our system's performance, dataset-specific fine-tuning is helpful on the larger datasets, post-training on other in-context datasets is helpful on one of the competition datasets, and that performance improves with model scale.
