Table of Contents
Fetching ...

"Hey..! This medicine made me sick": Sentiment Analysis of User-Generated Drug Reviews using Machine Learning Techniques

Abhiram B. Nair, Abhinand K., Anamika U., Denil Tom Jaison, Ajitha V., V. S. Anoop

TL;DR

This work tackles sentiment analysis of user-generated drug reviews to extract real-world perceptions of drug effectiveness and adverse reactions. It builds a three-class classification pipeline using 5,170 WebMD reviews, which are scraped with Beautiful Soup, manually labeled, and transformed into embeddings via pre-trained models (BERT, SciBERT, BioBERT, SBERT). A suite of classifiers (Decision Tree, SVC, Random Forest, and Recurrent Neural Network) is trained on these embeddings, with RNNs typically achieving higher accuracy than baseline methods. Although results show modest performance (roughly mid-0.5 accuracy), the study demonstrates the feasibility of applying transformer-based representations to pharmacovigilance data and provides a scalable framework for real-world sentiment monitoring, with future work aimed at larger datasets and more advanced deep learning models to boost performance.

Abstract

Sentiment analysis has become increasingly important in healthcare, especially in the biomedical and pharmaceutical fields. The data generated by the general public on the effectiveness, side effects, and adverse drug reactions are goldmines for different agencies and medicine producers to understand the concerns and reactions of people. Despite the challenge of obtaining datasets on drug-related problems, sentiment analysis on this topic would be a significant boon to the field. This project proposes a drug review classification system that classifies user reviews on a particular drug into different classes, such as positive, negative, and neutral. This approach uses a dataset that is collected from publicly available sources containing drug reviews, such as drugs.com. The collected data is manually labeled and verified manually to ensure that the labels are correct. Three pre-trained language models, such as BERT, SciBERT, and BioBERT, are used to obtain embeddings, which were later used as features to different machine learning classifiers such as decision trees, support vector machines, random forests, and also deep learning algorithms such as recurrent neural networks. The performance of these classifiers is quantified using precision, recall, and f1-score, and the results show that the proposed approaches are useful in analyzing the sentiments of people on different drugs.

"Hey..! This medicine made me sick": Sentiment Analysis of User-Generated Drug Reviews using Machine Learning Techniques

TL;DR

This work tackles sentiment analysis of user-generated drug reviews to extract real-world perceptions of drug effectiveness and adverse reactions. It builds a three-class classification pipeline using 5,170 WebMD reviews, which are scraped with Beautiful Soup, manually labeled, and transformed into embeddings via pre-trained models (BERT, SciBERT, BioBERT, SBERT). A suite of classifiers (Decision Tree, SVC, Random Forest, and Recurrent Neural Network) is trained on these embeddings, with RNNs typically achieving higher accuracy than baseline methods. Although results show modest performance (roughly mid-0.5 accuracy), the study demonstrates the feasibility of applying transformer-based representations to pharmacovigilance data and provides a scalable framework for real-world sentiment monitoring, with future work aimed at larger datasets and more advanced deep learning models to boost performance.

Abstract

Sentiment analysis has become increasingly important in healthcare, especially in the biomedical and pharmaceutical fields. The data generated by the general public on the effectiveness, side effects, and adverse drug reactions are goldmines for different agencies and medicine producers to understand the concerns and reactions of people. Despite the challenge of obtaining datasets on drug-related problems, sentiment analysis on this topic would be a significant boon to the field. This project proposes a drug review classification system that classifies user reviews on a particular drug into different classes, such as positive, negative, and neutral. This approach uses a dataset that is collected from publicly available sources containing drug reviews, such as drugs.com. The collected data is manually labeled and verified manually to ensure that the labels are correct. Three pre-trained language models, such as BERT, SciBERT, and BioBERT, are used to obtain embeddings, which were later used as features to different machine learning classifiers such as decision trees, support vector machines, random forests, and also deep learning algorithms such as recurrent neural networks. The performance of these classifiers is quantified using precision, recall, and f1-score, and the results show that the proposed approaches are useful in analyzing the sentiments of people on different drugs.
Paper Structure (16 sections, 7 figures, 4 tables)

This paper contains 16 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Our proposed model for sentiment analysis
  • Figure 2: Training and testing accuracy for BERT and SBERT models for decision tree, SVC, random forest, and logistic regression classifiers
  • Figure 3: Training and testing accuracy for BioBERT and SciBERT models for decision tree, SVC, random forest, and logistic regression classifiers
  • Figure 4: Accuracy and loss for the BERT model for decision tree, SVC, random forest, and logistic regression classifiers
  • Figure 5: Accuracy and loss for the SBERT model for decision tree, SVC, random forest, and logistic regression classifiers
  • ...and 2 more figures