Table of Contents
Fetching ...

LT4SG@SMM4H24: Tweets Classification for Digital Epidemiology of Childhood Health Outcomes Using Pre-Trained Language Models

Dasun Athukoralage, Thushari Atapattu, Menasha Thilakaratne, Katrina Falkner

TL;DR

This work tackles binary classification of tweets from pregnancy-disclosing users to detect reports of children with medical disorders, contributing to digital epidemiology of childhood health outcomes. It compares RoBERTa-large and BERTweet-large models and demonstrates that a hard-voted ensemble of three BERTweet-large fine-tuned runs yields the best test performance (F1=0.938), surpassing the benchmark by 1.18%. The study also confirms similar validation performance between RoBERTa-large and the BERTweet-large ensemble, while showing the ensemble's superior generalization to unseen data. The approach highlights the value of ensemble methods to stabilize predictions on small, high-variance datasets and points to further gains from adding more ensemble iterations. This has practical implications for scalable social-media-based monitoring of childhood health outcomes.

Abstract

This paper presents our approaches for the SMM4H24 Shared Task 5 on the binary classification of English tweets reporting children's medical disorders. Our first approach involves fine-tuning a single RoBERTa-large model, while the second approach entails ensembling the results of three fine-tuned BERTweet-large models. We demonstrate that although both approaches exhibit identical performance on validation data, the BERTweet-large ensemble excels on test data. Our best-performing system achieves an F1-score of 0.938 on test data, outperforming the benchmark classifier by 1.18%.

LT4SG@SMM4H24: Tweets Classification for Digital Epidemiology of Childhood Health Outcomes Using Pre-Trained Language Models

TL;DR

This work tackles binary classification of tweets from pregnancy-disclosing users to detect reports of children with medical disorders, contributing to digital epidemiology of childhood health outcomes. It compares RoBERTa-large and BERTweet-large models and demonstrates that a hard-voted ensemble of three BERTweet-large fine-tuned runs yields the best test performance (F1=0.938), surpassing the benchmark by 1.18%. The study also confirms similar validation performance between RoBERTa-large and the BERTweet-large ensemble, while showing the ensemble's superior generalization to unseen data. The approach highlights the value of ensemble methods to stabilize predictions on small, high-variance datasets and points to further gains from adding more ensemble iterations. This has practical implications for scalable social-media-based monitoring of childhood health outcomes.

Abstract

This paper presents our approaches for the SMM4H24 Shared Task 5 on the binary classification of English tweets reporting children's medical disorders. Our first approach involves fine-tuning a single RoBERTa-large model, while the second approach entails ensembling the results of three fine-tuned BERTweet-large models. We demonstrate that although both approaches exhibit identical performance on validation data, the BERTweet-large ensemble excels on test data. Our best-performing system achieves an F1-score of 0.938 on test data, outperforming the benchmark classifier by 1.18%.
Paper Structure (18 sections, 1 equation, 1 figure, 5 tables)

This paper contains 18 sections, 1 equation, 1 figure, 5 tables.

Figures (1)

  • Figure 1: Confusion matrices of the RoBERTa-large best-run and BERTweet-large ensemble on the validation dataset.