Table of Contents
Fetching ...

Identifying and Aligning Medical Claims Made on Social Media with Medical Evidence

Anthony Hughes, Xingyi Song

TL;DR

This work tackles the challenge of aligning social media medical claims with established medical evidence by introducing the Expansive Medical Claim Corpus (EMCC), a synthetic-data generator trained on PICO-structured inputs from the RedHOT corpus and Trialstreamer evidence. The EMCC enables training and evaluation of three tasks: identifying medical claims, extracting PICO spans, and retrieving relevant evidence, with domain-adapted models (notably BERT-Synth) achieving state-of-the-art performance on RedHOT benchmarks. The evidence retrieval component benefits notably from including PICO elements in queries, with BM25-PIO and DPR-PIO showing substantial precision gains and expert validation indicating practical relevance. The dataset and findings advance automated, evidence-aligned analysis of medical claims on social media, offering a path toward more reliable consumer-facing health information tools.

Abstract

Evidence-based medicine is the practice of making medical decisions that adhere to the latest, and best known evidence at that time. Currently, the best evidence is often found in the form of documents, such as randomized control trials, meta-analyses and systematic reviews. This research focuses on aligning medical claims made on social media platforms with this medical evidence. By doing so, individuals without medical expertise can more effectively assess the veracity of such medical claims. We study three core tasks: identifying medical claims, extracting medical vocabulary from these claims, and retrieving evidence relevant to those identified medical claims. We propose a novel system that can generate synthetic medical claims to aid each of these core tasks. We additionally introduce a novel dataset produced by our synthetic generator that, when applied to these tasks, demonstrates not only a more flexible and holistic approach, but also an improvement in all comparable metrics. We make our dataset, the Expansive Medical Claim Corpus (EMCC), available at https://zenodo.org/records/8321460

Identifying and Aligning Medical Claims Made on Social Media with Medical Evidence

TL;DR

This work tackles the challenge of aligning social media medical claims with established medical evidence by introducing the Expansive Medical Claim Corpus (EMCC), a synthetic-data generator trained on PICO-structured inputs from the RedHOT corpus and Trialstreamer evidence. The EMCC enables training and evaluation of three tasks: identifying medical claims, extracting PICO spans, and retrieving relevant evidence, with domain-adapted models (notably BERT-Synth) achieving state-of-the-art performance on RedHOT benchmarks. The evidence retrieval component benefits notably from including PICO elements in queries, with BM25-PIO and DPR-PIO showing substantial precision gains and expert validation indicating practical relevance. The dataset and findings advance automated, evidence-aligned analysis of medical claims on social media, offering a path toward more reliable consumer-facing health information tools.

Abstract

Evidence-based medicine is the practice of making medical decisions that adhere to the latest, and best known evidence at that time. Currently, the best evidence is often found in the form of documents, such as randomized control trials, meta-analyses and systematic reviews. This research focuses on aligning medical claims made on social media platforms with this medical evidence. By doing so, individuals without medical expertise can more effectively assess the veracity of such medical claims. We study three core tasks: identifying medical claims, extracting medical vocabulary from these claims, and retrieving evidence relevant to those identified medical claims. We propose a novel system that can generate synthetic medical claims to aid each of these core tasks. We additionally introduce a novel dataset produced by our synthetic generator that, when applied to these tasks, demonstrates not only a more flexible and holistic approach, but also an improvement in all comparable metrics. We make our dataset, the Expansive Medical Claim Corpus (EMCC), available at https://zenodo.org/records/8321460
Paper Structure (25 sections, 1 figure, 16 tables)