Table of Contents
Fetching ...

Mixed Feelings: Cross-Domain Sentiment Classification of Patient Feedback

Egil Rønningstad, Lilja Charlotte Storset, Petter Mæhlum, Lilja Øvrelid, Erik Velldal

TL;DR

This work addresses cross-domain sentiment classification for Norwegian patient feedback by comparing in-domain (NorPaC) and out-of-domain (NoReC) data across neural and non-neural models. It evaluates four-class polarity (positive, negative, mixed, neutral) and analyzes the effects of joint multi-domain training, domain differences in genre, and data scarcity. Key findings show neural models, especially NorBERT3 Large, achieve strong in-domain performance, while cross-domain data can boost performance when in-domain data are limited but may be detrimental when in-domain data are abundant; joint training offers mixed benefits depending on the target domain. The study provides practical guidance for deploying SA in healthcare text and highlights the value and limits of leveraging general-domain sentiment data for specialized domains.

Abstract

Sentiment analysis of patient feedback from the public health domain can aid decision makers in evaluating the provided services. The current paper focuses on free-text comments in patient surveys about general practitioners and psychiatric healthcare, annotated with four sentence-level polarity classes -- positive, negative, mixed and neutral -- while also attempting to alleviate data scarcity by leveraging general-domain sources in the form of reviews. For several different architectures, we compare in-domain and out-of-domain effects, as well as the effects of training joint multi-domain models.

Mixed Feelings: Cross-Domain Sentiment Classification of Patient Feedback

TL;DR

This work addresses cross-domain sentiment classification for Norwegian patient feedback by comparing in-domain (NorPaC) and out-of-domain (NoReC) data across neural and non-neural models. It evaluates four-class polarity (positive, negative, mixed, neutral) and analyzes the effects of joint multi-domain training, domain differences in genre, and data scarcity. Key findings show neural models, especially NorBERT3 Large, achieve strong in-domain performance, while cross-domain data can boost performance when in-domain data are limited but may be detrimental when in-domain data are abundant; joint training offers mixed benefits depending on the target domain. The study provides practical guidance for deploying SA in healthcare text and highlights the value and limits of leveraging general-domain sentiment data for specialized domains.

Abstract

Sentiment analysis of patient feedback from the public health domain can aid decision makers in evaluating the provided services. The current paper focuses on free-text comments in patient surveys about general practitioners and psychiatric healthcare, annotated with four sentence-level polarity classes -- positive, negative, mixed and neutral -- while also attempting to alleviate data scarcity by leveraging general-domain sources in the form of reviews. For several different architectures, we compare in-domain and out-of-domain effects, as well as the effects of training joint multi-domain models.

Paper Structure

This paper contains 13 sections, 1 figure, 7 tables.

Figures (1)

  • Figure 1: Learning curves, for two configurations: NorPaC fractions: The model is trained on fractions of the NorPaC training split, from 6.25% $\approx$ 6% (386 samples) successively doubling the training set up to the full NorPaC training split. NorPaC fractions + NoReC: The same fractions of the NorPaC training split, mixed with the full NoReC training split. All evaluations are on the full NorPaC test set, averaged over three runs with different seeds, and with the amounts of in-domain training data shown on log-scale.