Table of Contents
Fetching ...

QuaLLM: An LLM-based Framework to Extract Quantitative Insights from Online Forums

Varun Nagaraj Rao, Eesha Agarwal, Samantha Dalal, Dan Calacci, Andrés Monroy-Hernández

TL;DR

QuaLLM presents an LLM-based framework to translate large-scale online forum text into structured, survey-like quantitative insights. By combining generation, classification, aggregation, and prevalence prompts with a hybrid human-computational evaluation strategy, it scales thematic analysis to over one million Reddit comments from rideshare worker communities. The case study demonstrates four core concern themes and provides empirical metrics on factuality, completeness, and topic-representation, highlighting both promise and limitations of AI-assisted quantitative analysis. The approach offers a scalable pathway for surfacing policy-relevant worker concerns from online forums, with broader applicability across domains while underscoring the need for thoughtful human evaluation and safeguards.

Abstract

Online discussion forums provide crucial data to understand the concerns of a wide range of real-world communities. However, the typical qualitative and quantitative methodologies used to analyze those data, such as thematic analysis and topic modeling, are infeasible to scale or require significant human effort to translate outputs to human readable forms. This study introduces QuaLLM, a novel LLM-based framework to analyze and extract quantitative insights from text data on online forums. The framework consists of a novel prompting and human evaluation methodology. We applied this framework to analyze over one million comments from two of Reddit's rideshare worker communities, marking the largest study of its type. We uncover significant worker concerns regarding AI and algorithmic platform decisions, responding to regulatory calls about worker insights. In short, our work sets a new precedent for AI-assisted quantitative data analysis to surface concerns from online forums.

QuaLLM: An LLM-based Framework to Extract Quantitative Insights from Online Forums

TL;DR

QuaLLM presents an LLM-based framework to translate large-scale online forum text into structured, survey-like quantitative insights. By combining generation, classification, aggregation, and prevalence prompts with a hybrid human-computational evaluation strategy, it scales thematic analysis to over one million Reddit comments from rideshare worker communities. The case study demonstrates four core concern themes and provides empirical metrics on factuality, completeness, and topic-representation, highlighting both promise and limitations of AI-assisted quantitative analysis. The approach offers a scalable pathway for surfacing policy-relevant worker concerns from online forums, with broader applicability across domains while underscoring the need for thoughtful human evaluation and safeguards.

Abstract

Online discussion forums provide crucial data to understand the concerns of a wide range of real-world communities. However, the typical qualitative and quantitative methodologies used to analyze those data, such as thematic analysis and topic modeling, are infeasible to scale or require significant human effort to translate outputs to human readable forms. This study introduces QuaLLM, a novel LLM-based framework to analyze and extract quantitative insights from text data on online forums. The framework consists of a novel prompting and human evaluation methodology. We applied this framework to analyze over one million comments from two of Reddit's rideshare worker communities, marking the largest study of its type. We uncover significant worker concerns regarding AI and algorithmic platform decisions, responding to regulatory calls about worker insights. In short, our work sets a new precedent for AI-assisted quantitative data analysis to surface concerns from online forums.
Paper Structure (21 sections, 2 figures, 6 tables)

This paper contains 21 sections, 2 figures, 6 tables.

Figures (2)

  • Figure 1: QuaLLM transforms large-scale unstructured text-based online forum discussions on platforms like Reddit and Facebook into a structured survey-style format, identifying top-level themes associated with prevalence ranked sub-themes (by frequency of occurrence) and representative quotes.
  • Figure 2: QuaLLM's multiphase prompting attempts to evoke the steps that human analysts might perform.