Table of Contents
Fetching ...

Leveraging Twitter Data for Sentiment Analysis of Transit User Feedback: An NLP Framework

Adway Das, Abhishek Kumar Prajapati, Pengxiang Zhang, Mukund Srinath, Andisheh Ranjbari

TL;DR

This study presents an NLP framework that leverages Twitter data to analyze transit user feedback, addressing the high cost and limited coverage of traditional surveys. It combines Few-Shot learning for tweet topic classification with VADER lexicon-based sentiment analysis to categorize tweets into maintenance, scheduling, safety, or other and to quantify sentiment intensity. Validated on 2022 NYC subway data (36,000 tweets), the approach identifies principal concerns and maps them spatially around stations, with results corroborating a contemporaneous agency survey. The framework offers a scalable, inexpensive means for transit agencies to monitor public sentiment, target improvements, and plan surveys more efficiently, with broad applicability beyond NYC.

Abstract

Traditional methods of collecting user feedback through transit surveys are often time-consuming, resource intensive, and costly. In this paper, we propose a novel NLP-based framework that harnesses the vast, abundant, and inexpensive data available on social media platforms like Twitter to understand users' perceptions of various service issues. Twitter, being a microblogging platform, hosts a wealth of real-time user-generated content that often includes valuable feedback and opinions on various products, services, and experiences. The proposed framework streamlines the process of gathering and analyzing user feedback without the need for costly and time-consuming user feedback surveys using two techniques. First, it utilizes few-shot learning for tweet classification within predefined categories, allowing effective identification of the issues described in tweets. It then employs a lexicon-based sentiment analysis model to assess the intensity and polarity of the tweet sentiments, distinguishing between positive, negative, and neutral tweets. The effectiveness of the framework was validated on a subset of manually labeled Twitter data and was applied to the NYC subway system as a case study. The framework accurately classifies tweets into predefined categories related to safety, reliability, and maintenance of the subway system and effectively measured sentiment intensities within each category. The general findings were corroborated through a comparison with an agency-run customer survey conducted in the same year. The findings highlight the effectiveness of the proposed framework in gauging user feedback through inexpensive social media data to understand the pain points of the transit system and plan for targeted improvements.

Leveraging Twitter Data for Sentiment Analysis of Transit User Feedback: An NLP Framework

TL;DR

This study presents an NLP framework that leverages Twitter data to analyze transit user feedback, addressing the high cost and limited coverage of traditional surveys. It combines Few-Shot learning for tweet topic classification with VADER lexicon-based sentiment analysis to categorize tweets into maintenance, scheduling, safety, or other and to quantify sentiment intensity. Validated on 2022 NYC subway data (36,000 tweets), the approach identifies principal concerns and maps them spatially around stations, with results corroborating a contemporaneous agency survey. The framework offers a scalable, inexpensive means for transit agencies to monitor public sentiment, target improvements, and plan surveys more efficiently, with broad applicability beyond NYC.

Abstract

Traditional methods of collecting user feedback through transit surveys are often time-consuming, resource intensive, and costly. In this paper, we propose a novel NLP-based framework that harnesses the vast, abundant, and inexpensive data available on social media platforms like Twitter to understand users' perceptions of various service issues. Twitter, being a microblogging platform, hosts a wealth of real-time user-generated content that often includes valuable feedback and opinions on various products, services, and experiences. The proposed framework streamlines the process of gathering and analyzing user feedback without the need for costly and time-consuming user feedback surveys using two techniques. First, it utilizes few-shot learning for tweet classification within predefined categories, allowing effective identification of the issues described in tweets. It then employs a lexicon-based sentiment analysis model to assess the intensity and polarity of the tweet sentiments, distinguishing between positive, negative, and neutral tweets. The effectiveness of the framework was validated on a subset of manually labeled Twitter data and was applied to the NYC subway system as a case study. The framework accurately classifies tweets into predefined categories related to safety, reliability, and maintenance of the subway system and effectively measured sentiment intensities within each category. The general findings were corroborated through a comparison with an agency-run customer survey conducted in the same year. The findings highlight the effectiveness of the proposed framework in gauging user feedback through inexpensive social media data to understand the pain points of the transit system and plan for targeted improvements.
Paper Structure (16 sections, 4 equations, 8 figures, 3 tables)

This paper contains 16 sections, 4 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: MTA Subway Annual Ridership Numbers From 2012 To 2022
  • Figure 2: Illustration of The Area of Interest for Data Collection and Some Critical Locations of Manhattan Borough, NYC
  • Figure 3: The Filtering and Processing Steps Adopted for Data Collection.
  • Figure 4: The Process of VADER Sentiment Analysis
  • Figure 5: Comparison of Predicted and Ground Truth Labels for 500 Random Samples.
  • ...and 3 more figures