Table of Contents
Fetching ...

Developing a Comprehensive Framework for Sentiment Analysis in Turkish

Cem Rifki Aydin

TL;DR

<3-5 sentence high-level summary>This work tackles sentiment analysis for Turkish, a morphologically rich language, by building a comprehensive framework that integrates unsupervised, semi-supervised, and supervised signals. It introduces a morpheme-level polarity lexicon, a fine-grained morphological analysis, and novel word/document embeddings, alongside an ensemble neural ABSA model that combines recursive and recurrent networks. The thesis demonstrates state-of-the-art results on Turkish datasets and cross-linguistic success on English corpora, including ABSA performance on SemEval-2014 Task 4 and cross-domain sentiment classification. The contributions extend beyond Turkish to general NLP tasks through redefining context windows as subclauses and providing portable methods for morphologically-rich languages.</p>

Abstract

In this thesis, we developed a comprehensive framework for sentiment analysis that takes its many aspects into account mainly for Turkish. We have also proposed several approaches specific to sentiment analysis in English only. We have accordingly made five major and three minor contributions. We generated a novel and effective feature set by combining unsupervised, semi-supervised, and supervised metrics. We then fed them as input into classical machine learning methods, and outperformed neural network models for datasets of different genres in both Turkish and English. We created a polarity lexicon with a semi-supervised domain-specific method, which has been the first approach applied for corpora in Turkish. We performed a fine morphological analysis for the sentiment classification task in Turkish by determining the polarities of morphemes. This can be adapted to other morphologically-rich or agglutinative languages as well. We have built a novel neural network architecture, which combines recurrent and recursive neural network models for English. We built novel word embeddings that exploit sentiment, syntactic, semantic, and lexical characteristics for both Turkish and English. We also redefined context windows as subclauses in modelling word representations in English. This can also be applied to other linguistic fields and natural language processing tasks. We have achieved state-of-the-art and significant results for all these original approaches. Our minor contributions include methods related to aspect-based sentiment in Turkish, parameter redefinition in the semi-supervised approach, and aspect term extraction techniques for English. This thesis can be considered the most detailed and comprehensive study made on sentiment analysis in Turkish as of July, 2020. Our work has also contributed to the opinion classification problem in English.

Developing a Comprehensive Framework for Sentiment Analysis in Turkish

TL;DR

<3-5 sentence high-level summary>This work tackles sentiment analysis for Turkish, a morphologically rich language, by building a comprehensive framework that integrates unsupervised, semi-supervised, and supervised signals. It introduces a morpheme-level polarity lexicon, a fine-grained morphological analysis, and novel word/document embeddings, alongside an ensemble neural ABSA model that combines recursive and recurrent networks. The thesis demonstrates state-of-the-art results on Turkish datasets and cross-linguistic success on English corpora, including ABSA performance on SemEval-2014 Task 4 and cross-domain sentiment classification. The contributions extend beyond Turkish to general NLP tasks through redefining context windows as subclauses and providing portable methods for morphologically-rich languages.</p>

Abstract

In this thesis, we developed a comprehensive framework for sentiment analysis that takes its many aspects into account mainly for Turkish. We have also proposed several approaches specific to sentiment analysis in English only. We have accordingly made five major and three minor contributions. We generated a novel and effective feature set by combining unsupervised, semi-supervised, and supervised metrics. We then fed them as input into classical machine learning methods, and outperformed neural network models for datasets of different genres in both Turkish and English. We created a polarity lexicon with a semi-supervised domain-specific method, which has been the first approach applied for corpora in Turkish. We performed a fine morphological analysis for the sentiment classification task in Turkish by determining the polarities of morphemes. This can be adapted to other morphologically-rich or agglutinative languages as well. We have built a novel neural network architecture, which combines recurrent and recursive neural network models for English. We built novel word embeddings that exploit sentiment, syntactic, semantic, and lexical characteristics for both Turkish and English. We also redefined context windows as subclauses in modelling word representations in English. This can also be applied to other linguistic fields and natural language processing tasks. We have achieved state-of-the-art and significant results for all these original approaches. Our minor contributions include methods related to aspect-based sentiment in Turkish, parameter redefinition in the semi-supervised approach, and aspect term extraction techniques for English. This thesis can be considered the most detailed and comprehensive study made on sentiment analysis in Turkish as of July, 2020. Our work has also contributed to the opinion classification problem in English.

Paper Structure

This paper contains 92 sections, 10 equations, 16 figures, 20 tables.

Figures (16)

  • Figure 1: The flowchart of the proposed model.
  • Figure 2: Algorithm for unsupervised approach.
  • Figure 3: The visual summary of the semi-supervised approach ham:16. We tweaked the parameters of this approach and improved the success rates when evaluating the model on Turkish and English datasets.
  • Figure 4: Algorithm for semi-supervised approach.
  • Figure 5: Algorithm for supervised approaches.
  • ...and 11 more figures