Table of Contents
Fetching ...

Applying BERT and ChatGPT for Sentiment Analysis of Lyme Disease in Scientific Literature

Teo Susnjak

TL;DR

The paper addresses bias detection in scholarly discourse on chronic Lyme disease by applying sentiment analysis to a large corpus of abstracts. It presents a practical NLP pipeline that uses BERT-based sentiment classification and ChatGPT-based validation, augmented by SHAP explanations. The work discusses data collection/cleaning, tool choices (code-free and coding workflows), and cross-domain challenges of applying business-domain sentiment models to medical literature. The contribution provides a reproducible framework for researchers to analyze sentiment in medical texts and highlights the need for domain-specific, clinically oriented sentiment models.

Abstract

This chapter presents a practical guide for conducting Sentiment Analysis using Natural Language Processing (NLP) techniques in the domain of tick-borne disease text. The aim is to demonstrate the process of how the presence of bias in the discourse surrounding chronic manifestations of the disease can be evaluated. The goal is to use a dataset of 5643 abstracts collected from scientific journals on the topic of chronic Lyme disease to demonstrate using Python, the steps for conducting sentiment analysis using pre-trained language models and the process of validating the preliminary results using both interpretable machine learning tools, as well as a novel methodology of using emerging state-of-the-art large language models like ChatGPT. This serves as a useful resource for researchers and practitioners interested in using NLP techniques for sentiment analysis in the medical domain.

Applying BERT and ChatGPT for Sentiment Analysis of Lyme Disease in Scientific Literature

TL;DR

The paper addresses bias detection in scholarly discourse on chronic Lyme disease by applying sentiment analysis to a large corpus of abstracts. It presents a practical NLP pipeline that uses BERT-based sentiment classification and ChatGPT-based validation, augmented by SHAP explanations. The work discusses data collection/cleaning, tool choices (code-free and coding workflows), and cross-domain challenges of applying business-domain sentiment models to medical literature. The contribution provides a reproducible framework for researchers to analyze sentiment in medical texts and highlights the need for domain-specific, clinically oriented sentiment models.

Abstract

This chapter presents a practical guide for conducting Sentiment Analysis using Natural Language Processing (NLP) techniques in the domain of tick-borne disease text. The aim is to demonstrate the process of how the presence of bias in the discourse surrounding chronic manifestations of the disease can be evaluated. The goal is to use a dataset of 5643 abstracts collected from scientific journals on the topic of chronic Lyme disease to demonstrate using Python, the steps for conducting sentiment analysis using pre-trained language models and the process of validating the preliminary results using both interpretable machine learning tools, as well as a novel methodology of using emerging state-of-the-art large language models like ChatGPT. This serves as a useful resource for researchers and practitioners interested in using NLP techniques for sentiment analysis in the medical domain.
Paper Structure (10 sections, 2 figures)

This paper contains 10 sections, 2 figures.

Figures (2)

  • Figure 1: Example of the dataset showing the first five rows and the key columns, being the journal name, paper title, year of publication and the text of the abstract.
  • Figure 2: Example of reading the cleaned dataset into a Python variable.