Twitter Sentiment Analysis using Distributed Word and Sentence Representation
Dwarampudi Mahidhar Reddy, N V Subba Reddy, N V Subba Reddy
TL;DR
The paper tackles Twitter sentiment analysis by leveraging distributed word and sentence representations. It compares LSTM and CNN architectures on word vectors (Word2Vec CBOW, Skipgram, FastText) and an MLP on sentence vectors, using a large Thinknook Twitter dataset with careful preprocessing. Key findings show that Word2Vec Skipgram offers a favorable balance of accuracy and efficiency, LSTMs with multiple layers outperform CNNs, and word vectors generally outperform sentence vectors for sentiment tasks. The work demonstrates memory-efficient representations and provides practical guidance on constructing vectors for effective sentiment analysis in short, noisy social media text.
Abstract
An important part of the information gathering and data analysis is to find out what people think about, either a product or an entity. Twitter is an opinion rich social networking site. The posts or tweets from this data can be used for mining people's opinions. The recent surge of activity in this area can be attributed to the computational treatment of data, which made opinion extraction and sentiment analysis easier. This paper classifies tweets into positive and negative sentiments, but instead of using traditional methods or preprocessing text data here we use the distributed representations of words and sentences to classify the tweets. We use Long Short Term Memory (LSTM) Networks, Convolutional Neural Networks (CNNs) and Artificial Neural Networks. The first two are used on Distributed Representation of words while the latter is used on the distributed representation of sentences. This paper achieves accuracies as high as 81%. It also suggests the best and optimal ways for creating distributed representations of words for sentiment analysis, out of the available methods.
