Table of Contents
Fetching ...

Improving User Behavior Prediction: Leveraging Annotator Metadata in Supervised Machine Learning Models

Lynnette Hui Xian Ng, Kokil Jaidka, Kaiyuan Tay, Hansin Ahuja, Niyati Chhaya

TL;DR

The paper tackles the challenge of predicting user behavior from conversational text when crowdsourced labels are noisy. It proposes MSWEEM, a metadata-sensitive ensemble that uses annotator meta-features such as Throughput and Worktime to weight auxiliary label encodings before predicting the target variable. Empirical results show a 14% improvement on held-out Diplomacy data and about 12% on OffMyChest, with meta-features significantly enhancing performance across datasets and annotator cohorts. The work demonstrates the practical value of incorporating annotator behavior signals into NLP workflows, offering actionable guidance for crowdsourcing designs and robust modeling under label-quality uncertainty. It also provides insights into how different annotator cohorts (e.g., Master-qualified workers) contribute to data quality and model performance, informing quality control and data collection strategies.

Abstract

Supervised machine-learning models often underperform in predicting user behaviors from conversational text, hindered by poor crowdsourced label quality and low NLP task accuracy. We introduce the Metadata-Sensitive Weighted-Encoding Ensemble Model (MSWEEM), which integrates annotator meta-features like fatigue and speeding. First, our results show MSWEEM outperforms standard ensembles by 14% on held-out data and 12% on an alternative dataset. Second, we find that incorporating signals of annotator behavior, such as speed and fatigue, significantly boosts model performance. Third, we find that annotators with higher qualifications, such as Master's, deliver more consistent and faster annotations. Given the increasing uncertainty over annotation quality, our experiments show that understanding annotator patterns is crucial for enhancing model accuracy in user behavior prediction.

Improving User Behavior Prediction: Leveraging Annotator Metadata in Supervised Machine Learning Models

TL;DR

The paper tackles the challenge of predicting user behavior from conversational text when crowdsourced labels are noisy. It proposes MSWEEM, a metadata-sensitive ensemble that uses annotator meta-features such as Throughput and Worktime to weight auxiliary label encodings before predicting the target variable. Empirical results show a 14% improvement on held-out Diplomacy data and about 12% on OffMyChest, with meta-features significantly enhancing performance across datasets and annotator cohorts. The work demonstrates the practical value of incorporating annotator behavior signals into NLP workflows, offering actionable guidance for crowdsourcing designs and robust modeling under label-quality uncertainty. It also provides insights into how different annotator cohorts (e.g., Master-qualified workers) contribute to data quality and model performance, informing quality control and data collection strategies.

Abstract

Supervised machine-learning models often underperform in predicting user behaviors from conversational text, hindered by poor crowdsourced label quality and low NLP task accuracy. We introduce the Metadata-Sensitive Weighted-Encoding Ensemble Model (MSWEEM), which integrates annotator meta-features like fatigue and speeding. First, our results show MSWEEM outperforms standard ensembles by 14% on held-out data and 12% on an alternative dataset. Second, we find that incorporating signals of annotator behavior, such as speed and fatigue, significantly boosts model performance. Third, we find that annotators with higher qualifications, such as Master's, deliver more consistent and faster annotations. Given the increasing uncertainty over annotation quality, our experiments show that understanding annotator patterns is crucial for enhancing model accuracy in user behavior prediction.

Paper Structure

This paper contains 37 sections, 9 figures, 16 tables.

Figures (9)

  • Figure 1: The proposed Metadata-Sensitive Weighted-Encoding Ensemble Model. The embeddings correspond to those generated through evaluating different classifiers and pre-trained tokenizers.
  • Figure 2: The label distributions for the CLAff-OffMyChest and CLAff-Diplomary datasets. Because both datasets present very imbalanced class distributions, we adjusted for the skew by tuning the loss functions during the model training. 1 means the presence of the class, while 0 means the absence of the class.
  • Figure 3: The probability distributions of the different meta-feature variants for the CLAff-Diplomacy dataset. The most informative features are expected to have a wider spread of values.
  • Figure 4: Predictive performance on the CLAff-Diplomacy dataset considering the ensemble models in Table \ref{['tab:bestofpipelinefull']}. Enriched ensemble models improve upon simple ensemble models (with only auxiliary attributes). PC1, WT1, and other meta-feature variants are described in Section \ref{['sec:metafeaturevariants']} 'Meta-feature variants' and in Figure \ref{['fig:diplomacymetadataplots']}.
  • Figure 5: Ablation analysis for the CLAff-Diplomacy dataset: Effect of dataset size on the predictive performance.
  • ...and 4 more figures