Table of Contents
Fetching ...

Twitter Sentiment Analysis using Distributed Word and Sentence Representation

Dwarampudi Mahidhar Reddy, N V Subba Reddy, N V Subba Reddy

TL;DR

The paper tackles Twitter sentiment analysis by leveraging distributed word and sentence representations. It compares LSTM and CNN architectures on word vectors (Word2Vec CBOW, Skipgram, FastText) and an MLP on sentence vectors, using a large Thinknook Twitter dataset with careful preprocessing. Key findings show that Word2Vec Skipgram offers a favorable balance of accuracy and efficiency, LSTMs with multiple layers outperform CNNs, and word vectors generally outperform sentence vectors for sentiment tasks. The work demonstrates memory-efficient representations and provides practical guidance on constructing vectors for effective sentiment analysis in short, noisy social media text.

Abstract

An important part of the information gathering and data analysis is to find out what people think about, either a product or an entity. Twitter is an opinion rich social networking site. The posts or tweets from this data can be used for mining people's opinions. The recent surge of activity in this area can be attributed to the computational treatment of data, which made opinion extraction and sentiment analysis easier. This paper classifies tweets into positive and negative sentiments, but instead of using traditional methods or preprocessing text data here we use the distributed representations of words and sentences to classify the tweets. We use Long Short Term Memory (LSTM) Networks, Convolutional Neural Networks (CNNs) and Artificial Neural Networks. The first two are used on Distributed Representation of words while the latter is used on the distributed representation of sentences. This paper achieves accuracies as high as 81%. It also suggests the best and optimal ways for creating distributed representations of words for sentiment analysis, out of the available methods.

Twitter Sentiment Analysis using Distributed Word and Sentence Representation

TL;DR

The paper tackles Twitter sentiment analysis by leveraging distributed word and sentence representations. It compares LSTM and CNN architectures on word vectors (Word2Vec CBOW, Skipgram, FastText) and an MLP on sentence vectors, using a large Thinknook Twitter dataset with careful preprocessing. Key findings show that Word2Vec Skipgram offers a favorable balance of accuracy and efficiency, LSTMs with multiple layers outperform CNNs, and word vectors generally outperform sentence vectors for sentiment tasks. The work demonstrates memory-efficient representations and provides practical guidance on constructing vectors for effective sentiment analysis in short, noisy social media text.

Abstract

An important part of the information gathering and data analysis is to find out what people think about, either a product or an entity. Twitter is an opinion rich social networking site. The posts or tweets from this data can be used for mining people's opinions. The recent surge of activity in this area can be attributed to the computational treatment of data, which made opinion extraction and sentiment analysis easier. This paper classifies tweets into positive and negative sentiments, but instead of using traditional methods or preprocessing text data here we use the distributed representations of words and sentences to classify the tweets. We use Long Short Term Memory (LSTM) Networks, Convolutional Neural Networks (CNNs) and Artificial Neural Networks. The first two are used on Distributed Representation of words while the latter is used on the distributed representation of sentences. This paper achieves accuracies as high as 81%. It also suggests the best and optimal ways for creating distributed representations of words for sentiment analysis, out of the available methods.

Paper Structure

This paper contains 8 sections, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Bag of Words representation of the sentence “It is the best of the best”, where every word in the sentence is considered as a feature, if only a few words are chosen as feature words, then the rest of the words are not even represented in the vector.
  • Figure 2: Continuous Bag of Words representation of the sentence “It is the best of the best”. As you can see every word has its own vector inside the sentence vector, which has only one activated element and the rest are zeros, as the number of words increase the size of each of these naïve word vectors increase.
  • Figure 3: Word2Vec CBOW representation of words are plotted in this graph.
  • Figure 4: Word2Vec Skipgram word vectors are plotted in this graph
  • Figure 5: FastText word vectors are plotted in this graph. Here, an almost clear distinction between the words related to ‘good’ and ‘bad’ can be seen though they become very close at times. Such words are used in both positive and negative contexts.