"Hey..! This medicine made me sick": Sentiment Analysis of User-Generated Drug Reviews using Machine Learning Techniques

Abhiram B. Nair; Abhinand K.; Anamika U.; Denil Tom Jaison; Ajitha V.; V. S. Anoop

"Hey..! This medicine made me sick": Sentiment Analysis of User-Generated Drug Reviews using Machine Learning Techniques

Abhiram B. Nair, Abhinand K., Anamika U., Denil Tom Jaison, Ajitha V., V. S. Anoop

TL;DR

This work tackles sentiment analysis of user-generated drug reviews to extract real-world perceptions of drug effectiveness and adverse reactions. It builds a three-class classification pipeline using 5,170 WebMD reviews, which are scraped with Beautiful Soup, manually labeled, and transformed into embeddings via pre-trained models (BERT, SciBERT, BioBERT, SBERT). A suite of classifiers (Decision Tree, SVC, Random Forest, and Recurrent Neural Network) is trained on these embeddings, with RNNs typically achieving higher accuracy than baseline methods. Although results show modest performance (roughly mid-0.5 accuracy), the study demonstrates the feasibility of applying transformer-based representations to pharmacovigilance data and provides a scalable framework for real-world sentiment monitoring, with future work aimed at larger datasets and more advanced deep learning models to boost performance.

Abstract

Sentiment analysis has become increasingly important in healthcare, especially in the biomedical and pharmaceutical fields. The data generated by the general public on the effectiveness, side effects, and adverse drug reactions are goldmines for different agencies and medicine producers to understand the concerns and reactions of people. Despite the challenge of obtaining datasets on drug-related problems, sentiment analysis on this topic would be a significant boon to the field. This project proposes a drug review classification system that classifies user reviews on a particular drug into different classes, such as positive, negative, and neutral. This approach uses a dataset that is collected from publicly available sources containing drug reviews, such as drugs.com. The collected data is manually labeled and verified manually to ensure that the labels are correct. Three pre-trained language models, such as BERT, SciBERT, and BioBERT, are used to obtain embeddings, which were later used as features to different machine learning classifiers such as decision trees, support vector machines, random forests, and also deep learning algorithms such as recurrent neural networks. The performance of these classifiers is quantified using precision, recall, and f1-score, and the results show that the proposed approaches are useful in analyzing the sentiments of people on different drugs.

"Hey..! This medicine made me sick": Sentiment Analysis of User-Generated Drug Reviews using Machine Learning Techniques

TL;DR

Abstract

Paper Structure (16 sections, 7 figures, 4 tables)

This paper contains 16 sections, 7 figures, 4 tables.

Introduction
Related Studies
Materials and Methods
Beautful Soup
Bidirectional Encoder Representations from Transformers
SciBERT
BioBERT
SBERT
Decision Tree
Support Vector Classification
Random Forest
Recurrent Neural Network
Proposed Approach
Experimental Setup
Results and Discussion
...and 1 more sections

Figures (7)

Figure 1: Our proposed model for sentiment analysis
Figure 2: Training and testing accuracy for BERT and SBERT models for decision tree, SVC, random forest, and logistic regression classifiers
Figure 3: Training and testing accuracy for BioBERT and SciBERT models for decision tree, SVC, random forest, and logistic regression classifiers
Figure 4: Accuracy and loss for the BERT model for decision tree, SVC, random forest, and logistic regression classifiers
Figure 5: Accuracy and loss for the SBERT model for decision tree, SVC, random forest, and logistic regression classifiers
...and 2 more figures

"Hey..! This medicine made me sick": Sentiment Analysis of User-Generated Drug Reviews using Machine Learning Techniques

TL;DR

Abstract

"Hey..! This medicine made me sick": Sentiment Analysis of User-Generated Drug Reviews using Machine Learning Techniques

Authors

TL;DR

Abstract

Table of Contents

Figures (7)