Is ChatGPT a Good Sentiment Analyzer? A Preliminary Study
Zengzhi Wang, Qiming Xie, Yi Feng, Zixiang Ding, Zinong Yang, Rui Xia
TL;DR
Is ChatGPT a good sentiment analyzer? The paper conducts a preliminary, multi-faceted evaluation of ChatGPT's sentiment understanding across 7 tasks and 17 datasets in standard, polarity-shift, and open-domain settings, comparing it with fine-tuned BERT and SOTA models. It shows strong zero-shot performance, competitive open-domain generalization, and notable robustness to polarity shifts, while still lagging behind specialized SOTA models in domain-specific ABSA/CEE tasks. Advanced prompting techniques reveal that few-shot prompting can boost performance and self-consistency improves results, whereas chain-of-thought can hurt on some tasks. The findings suggest ChatGPT can serve as a universal sentiment analyzer with caveats and indicate directions for future open-domain benchmarks and domain adaptation.
Abstract
Recently, ChatGPT has drawn great attention from both the research community and the public. We are particularly interested in whether it can serve as a universal sentiment analyzer. To this end, in this work, we provide a preliminary evaluation of ChatGPT on the understanding of \emph{opinions}, \emph{sentiments}, and \emph{emotions} contained in the text. Specifically, we evaluate it in three settings, including \emph{standard} evaluation, \emph{polarity shift} evaluation and \emph{open-domain} evaluation. We conduct an evaluation on 7 representative sentiment analysis tasks covering 17 benchmark datasets and compare ChatGPT with fine-tuned BERT and corresponding state-of-the-art (SOTA) models on them. We also attempt several popular prompting techniques to elicit the ability further. Moreover, we conduct human evaluation and present some qualitative case studies to gain a deep comprehension of its sentiment analysis capabilities.
