Table of Contents
Fetching ...

Is ChatGPT a Good Sentiment Analyzer? A Preliminary Study

Zengzhi Wang, Qiming Xie, Yi Feng, Zixiang Ding, Zinong Yang, Rui Xia

TL;DR

Is ChatGPT a good sentiment analyzer? The paper conducts a preliminary, multi-faceted evaluation of ChatGPT's sentiment understanding across 7 tasks and 17 datasets in standard, polarity-shift, and open-domain settings, comparing it with fine-tuned BERT and SOTA models. It shows strong zero-shot performance, competitive open-domain generalization, and notable robustness to polarity shifts, while still lagging behind specialized SOTA models in domain-specific ABSA/CEE tasks. Advanced prompting techniques reveal that few-shot prompting can boost performance and self-consistency improves results, whereas chain-of-thought can hurt on some tasks. The findings suggest ChatGPT can serve as a universal sentiment analyzer with caveats and indicate directions for future open-domain benchmarks and domain adaptation.

Abstract

Recently, ChatGPT has drawn great attention from both the research community and the public. We are particularly interested in whether it can serve as a universal sentiment analyzer. To this end, in this work, we provide a preliminary evaluation of ChatGPT on the understanding of \emph{opinions}, \emph{sentiments}, and \emph{emotions} contained in the text. Specifically, we evaluate it in three settings, including \emph{standard} evaluation, \emph{polarity shift} evaluation and \emph{open-domain} evaluation. We conduct an evaluation on 7 representative sentiment analysis tasks covering 17 benchmark datasets and compare ChatGPT with fine-tuned BERT and corresponding state-of-the-art (SOTA) models on them. We also attempt several popular prompting techniques to elicit the ability further. Moreover, we conduct human evaluation and present some qualitative case studies to gain a deep comprehension of its sentiment analysis capabilities.

Is ChatGPT a Good Sentiment Analyzer? A Preliminary Study

TL;DR

Is ChatGPT a good sentiment analyzer? The paper conducts a preliminary, multi-faceted evaluation of ChatGPT's sentiment understanding across 7 tasks and 17 datasets in standard, polarity-shift, and open-domain settings, comparing it with fine-tuned BERT and SOTA models. It shows strong zero-shot performance, competitive open-domain generalization, and notable robustness to polarity shifts, while still lagging behind specialized SOTA models in domain-specific ABSA/CEE tasks. Advanced prompting techniques reveal that few-shot prompting can boost performance and self-consistency improves results, whereas chain-of-thought can hurt on some tasks. The findings suggest ChatGPT can serve as a universal sentiment analyzer with caveats and indicate directions for future open-domain benchmarks and domain adaptation.

Abstract

Recently, ChatGPT has drawn great attention from both the research community and the public. We are particularly interested in whether it can serve as a universal sentiment analyzer. To this end, in this work, we provide a preliminary evaluation of ChatGPT on the understanding of \emph{opinions}, \emph{sentiments}, and \emph{emotions} contained in the text. Specifically, we evaluate it in three settings, including \emph{standard} evaluation, \emph{polarity shift} evaluation and \emph{open-domain} evaluation. We conduct an evaluation on 7 representative sentiment analysis tasks covering 17 benchmark datasets and compare ChatGPT with fine-tuned BERT and corresponding state-of-the-art (SOTA) models on them. We also attempt several popular prompting techniques to elicit the ability further. Moreover, we conduct human evaluation and present some qualitative case studies to gain a deep comprehension of its sentiment analysis capabilities.
Paper Structure (22 sections, 7 figures, 11 tables)

This paper contains 22 sections, 7 figures, 11 tables.

Figures (7)

  • Figure 1: The overview of our evaluation.
  • Figure 2: Few-shot prompting results on ABSC and E2E-ABSA tasks.
  • Figure 3: Case study for ChatGPT on ABSC and E2E-ABSA in zero-shot and few-shot settings. The text in blue, black, green and red denote the given prompts, the examples to be evaluated, the responses of ChatGPT and the ground truths, respectively.
  • Figure 4: Case study for ChatGPT on CSI and CEE. The text in blue, black, green and red denote the given prompts, the examples to be evaluated, the responses of ChatGPT and the ground truths, respectively.
  • Figure 5: Case study for ChatGPT on ECE and ECPE in both Chinese (left) and English (right). The text in blue, black, green and red denote the given prompts, the examples to be evaluated, the responses of ChatGPT and the ground truths, respectively.
  • ...and 2 more figures