LLMTaxo: Leveraging Large Language Models for Constructing Taxonomy of Factual Claims from Social Media

Haiqi Zhang; Zhengyuan Zhu; Zeyu Zhang; Chengkai Li

LLMTaxo: Leveraging Large Language Models for Constructing Taxonomy of Factual Claims from Social Media

Haiqi Zhang, Zhengyuan Zhu, Zeyu Zhang, Chengkai Li

TL;DR

The paper tackles the problem of organizing the flood of factual claims on social media by constructing a multi-level taxonomy. It introduces LLMTaxo, an LLM-driven framework that builds a three-tier taxonomy with broad $T_b$, medium $T_m$, and detailed $T_d$ topics using seed taxonomies and in-context learning, plus a workflow for claim detection and distinct-claim consolidation. The authors present a dedicated taxonomy evaluation suite and claim-topic evaluation, demonstrating that GPT-4o mini and Gemini offer strong performance, with seed taxonomy and careful consolidation improving readability and reducing redundancy across three diverse domains. The work provides a reusable data/code release, evaluates scalability and robustness, and discusses ethical considerations, bias, and the potential impact on fact-checking workflows. Overall, LLMTaxo offers a scalable path toward automated, multi-granularity organization of online factual claims to support researchers, fact-checkers, and AI-assisted information navigation.

Abstract

With the rapid expansion of content on social media platforms, analyzing and comprehending online discourse has become increasingly complex. This paper introduces LLMTaxo, a novel framework leveraging large language models for the automated construction of taxonomies of factual claims from social media by generating topics at multiple levels of granularity. The resulting hierarchical structure significantly reduces redundancy and improves information accessibility. We also propose dedicated taxonomy evaluation metrics to enable comprehensive assessment. Evaluations conducted on three diverse datasets demonstrate LLMTaxo's effectiveness in producing clear, coherent, and comprehensive taxonomies. Among the evaluated models, GPT-4o mini consistently outperforms others across most metrics. The framework's flexibility and low reliance on manual intervention underscore its potential for broad applicability.

LLMTaxo: Leveraging Large Language Models for Constructing Taxonomy of Factual Claims from Social Media

TL;DR

, medium

, and detailed

topics using seed taxonomies and in-context learning, plus a workflow for claim detection and distinct-claim consolidation. The authors present a dedicated taxonomy evaluation suite and claim-topic evaluation, demonstrating that GPT-4o mini and Gemini offer strong performance, with seed taxonomy and careful consolidation improving readability and reducing redundancy across three diverse domains. The work provides a reusable data/code release, evaluates scalability and robustness, and discusses ethical considerations, bias, and the potential impact on fact-checking workflows. Overall, LLMTaxo offers a scalable path toward automated, multi-granularity organization of online factual claims to support researchers, fact-checkers, and AI-assisted information navigation.

LLMTaxo: Leveraging Large Language Models for Constructing Taxonomy of Factual Claims from Social Media

TL;DR

Abstract

LLMTaxo: Leveraging Large Language Models for Constructing Taxonomy of Factual Claims from Social Media

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)