Table of Contents
Fetching ...

PERSPECTRA: A Scalable and Configurable Pluralist Benchmark of Perspectives from Arguments

Shangrui Nie, Kian Omoomi, Lucie Flek, Zhixue Zhao, Charles Welch

TL;DR

PERSPECTRA tackles the lack of pluralism-aware evaluation in large language models by introducing a scalable benchmark that merges Kialo's structured pro/con debates with the linguistic richness of Reddit discussions. The authors build a retrieval-and-expansion pipeline that links topic–opinion pairs to Reddit comments and then generates five naturalistic expansions per pair, yielding 3,810 expanded arguments across 762 opinions on 100 topics. They formalize three downstream tasks—opinion counting, opinion matching, and polarity check—and provide extensive analyses of model performance, revealing systematic challenges such as opinion overestimation, semantic overlap in matching, and concession traps in polarity. The dataset, evaluation protocol, and prompts are released to support reproducibility, with results underscoring significant headroom for pluralism-aware reasoning in current models and enabling future methods aimed at preserving viewpoint diversity in generation and alignment.

Abstract

Pluralism, the capacity to engage with diverse perspectives without collapsing them into a single viewpoint, is critical for developing large language models that faithfully reflect human heterogeneity. Yet this characteristic has not been carefully examined in the LLM research community and remains absent from most alignment studies. Debate-oriented sources provide a natural entry point for pluralism research. Previous work builds on online debate sources but remains constrained by costly human validation. Other debate-rich platforms such as Reddit and Kialo also offer promising material: Reddit provides linguistic diversity and scale but lacks clear argumentative structure, while Kialo supplies explicit pro/con graphs but remains overly concise and detached from natural discourse. We introduce PERSPECTRA, a pluralist benchmark that integrates the structural clarity of Kialo debate graphs with the linguistic diversity of real Reddit discussions. Using a controlled retrieval-and-expansion pipeline, we construct 3,810 enriched arguments spanning 762 pro/con stances on 100 controversial topics. Each opinion is expanded to multiple naturalistic variants, enabling robust evaluation of pluralism. We initialise three tasks with PERSPECTRA: opinion counting (identifying distinct viewpoints), opinion matching (aligning supporting stances and discourse to source opinions), and polarity check (inferring aggregate stance in mixed discourse). Experiments with state-of-the-art open-source and proprietary LLMs, highlight systematic failures, such as overestimating the number of viewpoints and misclassifying concessive structures, underscoring the difficulty of pluralism-aware understanding and reasoning. By combining diversity with structure, PERSPECTRA establishes the first scalable, configurable benchmark for evaluating how well models represent, distinguish, and reason over multiple perspectives.

PERSPECTRA: A Scalable and Configurable Pluralist Benchmark of Perspectives from Arguments

TL;DR

PERSPECTRA tackles the lack of pluralism-aware evaluation in large language models by introducing a scalable benchmark that merges Kialo's structured pro/con debates with the linguistic richness of Reddit discussions. The authors build a retrieval-and-expansion pipeline that links topic–opinion pairs to Reddit comments and then generates five naturalistic expansions per pair, yielding 3,810 expanded arguments across 762 opinions on 100 topics. They formalize three downstream tasks—opinion counting, opinion matching, and polarity check—and provide extensive analyses of model performance, revealing systematic challenges such as opinion overestimation, semantic overlap in matching, and concession traps in polarity. The dataset, evaluation protocol, and prompts are released to support reproducibility, with results underscoring significant headroom for pluralism-aware reasoning in current models and enabling future methods aimed at preserving viewpoint diversity in generation and alignment.

Abstract

Pluralism, the capacity to engage with diverse perspectives without collapsing them into a single viewpoint, is critical for developing large language models that faithfully reflect human heterogeneity. Yet this characteristic has not been carefully examined in the LLM research community and remains absent from most alignment studies. Debate-oriented sources provide a natural entry point for pluralism research. Previous work builds on online debate sources but remains constrained by costly human validation. Other debate-rich platforms such as Reddit and Kialo also offer promising material: Reddit provides linguistic diversity and scale but lacks clear argumentative structure, while Kialo supplies explicit pro/con graphs but remains overly concise and detached from natural discourse. We introduce PERSPECTRA, a pluralist benchmark that integrates the structural clarity of Kialo debate graphs with the linguistic diversity of real Reddit discussions. Using a controlled retrieval-and-expansion pipeline, we construct 3,810 enriched arguments spanning 762 pro/con stances on 100 controversial topics. Each opinion is expanded to multiple naturalistic variants, enabling robust evaluation of pluralism. We initialise three tasks with PERSPECTRA: opinion counting (identifying distinct viewpoints), opinion matching (aligning supporting stances and discourse to source opinions), and polarity check (inferring aggregate stance in mixed discourse). Experiments with state-of-the-art open-source and proprietary LLMs, highlight systematic failures, such as overestimating the number of viewpoints and misclassifying concessive structures, underscoring the difficulty of pluralism-aware understanding and reasoning. By combining diversity with structure, PERSPECTRA establishes the first scalable, configurable benchmark for evaluating how well models represent, distinguish, and reason over multiple perspectives.
Paper Structure (42 sections, 1 figure, 5 tables)

This paper contains 42 sections, 1 figure, 5 tables.

Figures (1)

  • Figure 1: Overview of the dataset construction pipeline. Kialo debates (topics and opinions) are paired with Reddit comments via retrieval, then expanded through controlled prompting to produce naturalistic argument variants. The resulting dataset contains structured opinions enriched with Reddit-based phrasings.