Table of Contents
Fetching ...

Scope of Large Language Models for Mining Emerging Opinions in Online Health Discourse

Joseph Gatto, Madhusudan Basak, Yash Srivastava, Philip Bohlman, Sarah M. Preum

TL;DR

An LLM-powered framework for the curation and evaluation of emerging opinion mining in online health communities and shows that GPT-4 significantly outperforms prior works on zero-shot stance detection.

Abstract

In this paper, we develop an LLM-powered framework for the curation and evaluation of emerging opinion mining in online health communities. We formulate emerging opinion mining as a pairwise stance detection problem between (title, comment) pairs sourced from Reddit, where post titles contain emerging health-related claims on a topic that is not predefined. The claims are either explicitly or implicitly expressed by the user. We detail (i) a method of claim identification -- the task of identifying if a post title contains a claim and (ii) an opinion mining-driven evaluation framework for stance detection using LLMs. We facilitate our exploration by releasing a novel test dataset, Long COVID-Stance, or LC-stance, which can be used to evaluate LLMs on the tasks of claim identification and stance detection in online health communities. Long Covid is an emerging post-COVID disorder with uncertain and complex treatment guidelines, thus making it a suitable use case for our task. LC-Stance contains long COVID treatment related discourse sourced from a Reddit community. Our evaluation shows that GPT-4 significantly outperforms prior works on zero-shot stance detection. We then perform thorough LLM model diagnostics, identifying the role of claim type (i.e. implicit vs explicit claims) and comment length as sources of model error.

Scope of Large Language Models for Mining Emerging Opinions in Online Health Discourse

TL;DR

An LLM-powered framework for the curation and evaluation of emerging opinion mining in online health communities and shows that GPT-4 significantly outperforms prior works on zero-shot stance detection.

Abstract

In this paper, we develop an LLM-powered framework for the curation and evaluation of emerging opinion mining in online health communities. We formulate emerging opinion mining as a pairwise stance detection problem between (title, comment) pairs sourced from Reddit, where post titles contain emerging health-related claims on a topic that is not predefined. The claims are either explicitly or implicitly expressed by the user. We detail (i) a method of claim identification -- the task of identifying if a post title contains a claim and (ii) an opinion mining-driven evaluation framework for stance detection using LLMs. We facilitate our exploration by releasing a novel test dataset, Long COVID-Stance, or LC-stance, which can be used to evaluate LLMs on the tasks of claim identification and stance detection in online health communities. Long Covid is an emerging post-COVID disorder with uncertain and complex treatment guidelines, thus making it a suitable use case for our task. LC-Stance contains long COVID treatment related discourse sourced from a Reddit community. Our evaluation shows that GPT-4 significantly outperforms prior works on zero-shot stance detection. We then perform thorough LLM model diagnostics, identifying the role of claim type (i.e. implicit vs explicit claims) and comment length as sources of model error.
Paper Structure (28 sections, 6 figures, 3 tables)

This paper contains 28 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Example post title and comment pairs from r/covidlonghaulers displaying flairs (Article/Research), claim type (Implicit/Explicit), and stance label (In-Favor/Against/None). In the "Favor" sample (top), the comment supports the claim in the title that long COVID causes brain changes. In the "Against" sample (bottom), the title and comment disagree on the source of vision problems (i.e. long COVID vs vaccine).
  • Figure 2: Comparison of the sample length distributions of LC-Stance and COVIDLies, a popular stance dataset sourced from Twitter/X. The mean lengths are similar for both datasets. However, LC-Stance contains significantly longer samples: maximum sample lengths in COVIDLies and LC-Stance are 124 and 544 words, respectively.
  • Figure 3: Overall pipeline of our LLM-powered data curation and evaluation framework for topic-agnostic stance detection. First, Reddit data is collected, then titles are filtered using claim identification. Finally, samples are fed to stance classifiers for evaluation.
  • Figure 4: Claim identification F1 scores for each claim type. We find that LLMs are robust to processing varying claim types, with GPT-3.5 exhibiting best overall performance.
  • Figure 5: Stance detection performance broken across claim types. We find that all models are generally robust to claim type, with 3/4 exhibiting higher performance on implicit examples.
  • ...and 1 more figures