Table of Contents
Fetching ...

Enhancing Sentiment Classification with Machine Learning and Combinatorial Fusion

Sean Patten, Pin-Yu Chen, Christina Schweikert, D. Frank Hsu

TL;DR

This work tackles improving sentiment classification by fusing heterogeneous learners through Combinatorial Fusion Analysis (CFA). It combines RoBERTa with traditional bag-of-words models (SVM, XGBoost, Random Forest) and uses CFA to quantify and exploit cognitive diversity, achieving a IMDb test accuracy of 97.072% that surpasses single models and conventional ensembles. The results demonstrate that diversity-weighted fusion robustly enhances performance while maintaining practical computational efficiency on modest hardware. The study also argues for broader generalization of CFA beyond IMDb, suggesting future work to expand base classifiers and analyze error corrections in depth.

Abstract

This paper presents a novel approach to sentiment classification using the application of Combinatorial Fusion Analysis (CFA) to integrate an ensemble of diverse machine learning models, achieving state-of-the-art accuracy on the IMDB sentiment analysis dataset of 97.072\%. CFA leverages the concept of cognitive diversity, which utilizes rank-score characteristic functions to quantify the dissimilarity between models and strategically combine their predictions. This is in contrast to the common process of scaling the size of individual models, and thus is comparatively efficient in computing resource use. Experimental results also indicate that CFA outperforms traditional ensemble methods by effectively computing and employing model diversity. The approach in this paper implements the combination of a transformer-based model of the RoBERTa architecture with traditional machine learning models, including Random Forest, SVM, and XGBoost.

Enhancing Sentiment Classification with Machine Learning and Combinatorial Fusion

TL;DR

This work tackles improving sentiment classification by fusing heterogeneous learners through Combinatorial Fusion Analysis (CFA). It combines RoBERTa with traditional bag-of-words models (SVM, XGBoost, Random Forest) and uses CFA to quantify and exploit cognitive diversity, achieving a IMDb test accuracy of 97.072% that surpasses single models and conventional ensembles. The results demonstrate that diversity-weighted fusion robustly enhances performance while maintaining practical computational efficiency on modest hardware. The study also argues for broader generalization of CFA beyond IMDb, suggesting future work to expand base classifiers and analyze error corrections in depth.

Abstract

This paper presents a novel approach to sentiment classification using the application of Combinatorial Fusion Analysis (CFA) to integrate an ensemble of diverse machine learning models, achieving state-of-the-art accuracy on the IMDB sentiment analysis dataset of 97.072\%. CFA leverages the concept of cognitive diversity, which utilizes rank-score characteristic functions to quantify the dissimilarity between models and strategically combine their predictions. This is in contrast to the common process of scaling the size of individual models, and thus is comparatively efficient in computing resource use. Experimental results also indicate that CFA outperforms traditional ensemble methods by effectively computing and employing model diversity. The approach in this paper implements the combination of a transformer-based model of the RoBERTa architecture with traditional machine learning models, including Random Forest, SVM, and XGBoost.

Paper Structure

This paper contains 12 sections, 8 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Accuracy Performance (Average Methods - ASC, ARC).
  • Figure 2: Accuracy Performance (Performance Methods - WRCP, WRCDS).
  • Figure 3: Accuracy Performance (Diversity Methods - WSCDS, WRCDS).
  • Figure 4: Rank-score Function Graph by Model (Test Dataset). The graph shows the normalized prediction scores against the ranks for each of the four models: RoBERTa (A), SVM (B), XGBoost (C), and RandomForest (D).