Thumbs up? Sentiment Classification using Machine Learning Techniques

Bo Pang; Lillian Lee; Shivakumar Vaithyanathan

Thumbs up? Sentiment Classification using Machine Learning Techniques

Bo Pang, Lillian Lee, Shivakumar Vaithyanathan

TL;DR

The paper investigates automatic sentiment classification of online text, focusing on positive versus negative movie reviews, and evaluates supervised ML approaches on an IMDb-derived corpus to assess the relative difficulty of sentiment versus topic classification. It compares three standard bag-of-features classifiers—Naive Bayes, Maximum Entropy, and Support Vector Machines—using a unified feature representation with 700 positive and 700 negative reviews, negation tagging, and unigram/bigram features in a three-fold cross-validation setup. The results show that supervised methods outperform baselines, with SVM tending to achieve the best performance; unigram presence features are the most effective, while bigrams generally provide little or no gain and can hurt, and POS features offer only modest gains for NB but can degrade SVM. The authors argue that sentiment is subtler than topic cues, highlight the value of corpus-driven features and negation handling, and suggest future work on discourse- and sentence-level analysis to further improve sentiment detection.

Abstract

We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.

Thumbs up? Sentiment Classification using Machine Learning Techniques

TL;DR

Abstract

Paper Structure (17 sections, 5 equations, 3 figures)

This paper contains 17 sections, 5 equations, 3 figures.

Introduction
Previous Work
The Movie-Review Domain
A Closer Look At the Problem
Machine Learning Methods
Naive Bayes
Maximum Entropy
Support Vector Machines
Evaluation
Experimental Set-up
Results
Initial unigram results
Feature frequency vs. presence
Bigrams
Parts of speech
...and 2 more sections

Figures (3)

Figure 1: Baseline results for human word lists. Data: 700 positive and 700 negative reviews.
Figure 2: Results for baseline using introspection and simple statistics of the data (including test data).
Figure 3: Average three-fold cross-validation accuracies, in percent. Boldface: best performance for a given setting (row). Recall that our baseline results ranged from 50% to 69%.

Thumbs up? Sentiment Classification using Machine Learning Techniques

TL;DR

Abstract

Thumbs up? Sentiment Classification using Machine Learning Techniques

Authors

TL;DR

Abstract

Table of Contents

Figures (3)