Bridging Emotions and Architecture: Sentiment Analysis in Modern Distributed Systems
Mahak Shah, Akaash Vishal Hazarika, Meetu Malhotra, Sachin C. Patil, Joshit Mohanty
TL;DR
This paper investigates bridging sentiment analysis with distributed systems to handle large-scale text data efficiently. It compares a single-node pipeline against a four-node distributed Spark-based architecture that uses BERT embeddings and logistic regression on the Sentiment140 dataset. Results show a 75% reduction in processing time and a modest accuracy gain (about 3.5 percentage points) with distributed processing, while lowering per-node resource requirements. The work demonstrates the practicality of scalable, real-time capable NLP pipelines for big data contexts and outlines concrete future directions, including federated learning and hardware acceleration.
Abstract
Sentiment analysis is a field within NLP that has gained importance because it is applied in various areas such as; social media surveillance, customer feedback evaluation and market research. At the same time, distributed systems allow for effective processing of large amounts of data. Therefore, this paper examines how sentiment analysis converges with distributed systems by concentrating on different approaches, challenges and future investigations. Furthermore, we do an extensive experiment where we train sentiment analysis models using both single node configuration and distributed architecture to bring out the benefits and shortcomings of each method in terms of performance and accuracy.
