Table of Contents
Fetching ...

Reassessing the Role of Chain-of-Thought in Sentiment Analysis: Insights and Limitations

Kaiyuan Zheng, Qinghua Zhao, Lei Li

TL;DR

The paper examines whether chain-of-thought reasoning enhances semantic understanding in sentiment analysis by testing CoT prompts on ABSA tasks. It uses Gemma-2 and LLaMA-3 models across SemEval-2014 Laptop and Restaurant data and a manually constructed MES dataset, including both explicit and implicit sentiments, plus perturbation experiments to separate pre-training from demonstration influence. The results show CoT provides limited benefits, especially for larger models, and that model decisions often hinge on demonstrations rather than pre-trained knowledge, with explicit sentiment cues more aligned to inputs than implicit ones. These findings contribute to the language–thought debate by suggesting independence between language and thought in this context and have practical implications for prompting strategies in sentiment analysis.

Abstract

The relationship between language and thought remains an unresolved philosophical issue. Existing viewpoints can be broadly categorized into two schools: one asserting their independence, and another arguing that language constrains thought. In the context of large language models, this debate raises a crucial question: Does a language model's grasp of semantic meaning depend on thought processes? To explore this issue, we investigate whether reasoning techniques can facilitate semantic understanding. Specifically, we conceptualize thought as reasoning, employ chain-of-thought prompting as a reasoning technique, and examine its impact on sentiment analysis tasks. The experiments show that chain-of-thought has a minimal impact on sentiment analysis tasks. Both the standard and chain-of-thought prompts focus on aspect terms rather than sentiment in the generated content. Furthermore, counterfactual experiments reveal that the model's handling of sentiment tasks primarily depends on information from demonstrations. The experimental results support the first viewpoint.

Reassessing the Role of Chain-of-Thought in Sentiment Analysis: Insights and Limitations

TL;DR

The paper examines whether chain-of-thought reasoning enhances semantic understanding in sentiment analysis by testing CoT prompts on ABSA tasks. It uses Gemma-2 and LLaMA-3 models across SemEval-2014 Laptop and Restaurant data and a manually constructed MES dataset, including both explicit and implicit sentiments, plus perturbation experiments to separate pre-training from demonstration influence. The results show CoT provides limited benefits, especially for larger models, and that model decisions often hinge on demonstrations rather than pre-trained knowledge, with explicit sentiment cues more aligned to inputs than implicit ones. These findings contribute to the language–thought debate by suggesting independence between language and thought in this context and have practical implications for prompting strategies in sentiment analysis.

Abstract

The relationship between language and thought remains an unresolved philosophical issue. Existing viewpoints can be broadly categorized into two schools: one asserting their independence, and another arguing that language constrains thought. In the context of large language models, this debate raises a crucial question: Does a language model's grasp of semantic meaning depend on thought processes? To explore this issue, we investigate whether reasoning techniques can facilitate semantic understanding. Specifically, we conceptualize thought as reasoning, employ chain-of-thought prompting as a reasoning technique, and examine its impact on sentiment analysis tasks. The experiments show that chain-of-thought has a minimal impact on sentiment analysis tasks. Both the standard and chain-of-thought prompts focus on aspect terms rather than sentiment in the generated content. Furthermore, counterfactual experiments reveal that the model's handling of sentiment tasks primarily depends on information from demonstrations. The experimental results support the first viewpoint.
Paper Structure (14 sections, 7 figures, 2 tables)

This paper contains 14 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: The framework of our work.
  • Figure 2: Overall accuracy across datasets and shots. The top row presents results for the Laptop, while the bottom row shows the Restaurant. Each row displays accuracy values for 1-, 4-, 8-, 12-, 15-, and 18-shot settings, respectively.
  • Figure 3: Agreement (Cohen’s Kappa value, treated as weights for majority voting) between model predictions and ground truth across datasets and sentiment types (1-shot setting). From left to right: Laptop (explicit), Laptop (implicit), Restaurant (explicit), and Restaurant (implicit).
  • Figure 4: Model accuracy as a function of emotion shifts and categories. Results shown for 4-shot CoT- v1 on the MES dataset.
  • Figure 5: Similarity between input and output tokens. Top row (Laptop) and bottom row (Restaurant) showing prompts for CoT- v1, - v2, and - v3 (Gemma2-2b with 18-shot on explicit split is reported).
  • ...and 2 more figures