Table of Contents
Fetching ...

From Competition to Collaboration: Designing Sustainable Mechanisms Between LLMs and Online Forums

Niv Fono, Yftah Ziser, Omer Ben-Porat

TL;DR

The paper addresses the sustainability clash between GenAI systems and data-driven online forums by modeling their interaction as a sequential, asymmetric-information game where a GenAI provider proposes questions and a forum curates publication. It introduces a non-monetary collaboration framework, formalizes utilities and a Nash-product-based benchmark, and analyzes utility recovery under full-information and heuristic strategies using real Stack Exchange data and diverse LLMs. Empirical results show a systematic misalignment between model-learning value and forum engagement, yet the proposed heuristics recover substantial fractions of the ideal joint utility (approximately 46–52% for GenAI and 55–66% for the forum under full information), with asymmetric-information settings further enhancing practical gains. The findings suggest that lightweight, acceptance-aware collaboration can sustain knowledge sharing and data quality for AI systems without compromising forum autonomy or trust, while highlighting limitations and directions for future work on multi-agent and nonlinear utility considerations.

Abstract

While Generative AI (GenAI) systems draw users away from (Q&A) forums, they also depend on the very data those forums produce to improve their performance. Addressing this paradox, we propose a framework of sequential interaction, in which a GenAI system proposes questions to a forum that can publish some of them. Our framework captures several intricacies of such a collaboration, including non-monetary exchanges, asymmetric information, and incentive misalignment. We bring the framework to life through comprehensive, data-driven simulations using real Stack Exchange data and commonly used LLMs. We demonstrate the incentive misalignment empirically, yet show that players can achieve roughly half of the utility in an ideal full-information scenario. Our results highlight the potential for sustainable collaboration that preserves effective knowledge sharing between AI systems and human knowledge platforms.

From Competition to Collaboration: Designing Sustainable Mechanisms Between LLMs and Online Forums

TL;DR

The paper addresses the sustainability clash between GenAI systems and data-driven online forums by modeling their interaction as a sequential, asymmetric-information game where a GenAI provider proposes questions and a forum curates publication. It introduces a non-monetary collaboration framework, formalizes utilities and a Nash-product-based benchmark, and analyzes utility recovery under full-information and heuristic strategies using real Stack Exchange data and diverse LLMs. Empirical results show a systematic misalignment between model-learning value and forum engagement, yet the proposed heuristics recover substantial fractions of the ideal joint utility (approximately 46–52% for GenAI and 55–66% for the forum under full information), with asymmetric-information settings further enhancing practical gains. The findings suggest that lightweight, acceptance-aware collaboration can sustain knowledge sharing and data quality for AI systems without compromising forum autonomy or trust, while highlighting limitations and directions for future work on multi-agent and nonlinear utility considerations.

Abstract

While Generative AI (GenAI) systems draw users away from (Q&A) forums, they also depend on the very data those forums produce to improve their performance. Addressing this paradox, we propose a framework of sequential interaction, in which a GenAI system proposes questions to a forum that can publish some of them. Our framework captures several intricacies of such a collaboration, including non-monetary exchanges, asymmetric information, and incentive misalignment. We bring the framework to life through comprehensive, data-driven simulations using real Stack Exchange data and commonly used LLMs. We demonstrate the incentive misalignment empirically, yet show that players can achieve roughly half of the utility in an ideal full-information scenario. Our results highlight the potential for sustainable collaboration that preserves effective knowledge sharing between AI systems and human knowledge platforms.
Paper Structure (41 sections, 3 theorems, 14 equations, 2 figures, 7 tables)

This paper contains 41 sections, 3 theorems, 14 equations, 2 figures, 7 tables.

Key Result

Theorem 3.2

The problem in Equation eq:Nash product is NP-hard.

Figures (2)

  • Figure 1: Iterative interaction between a GenAI provider and an online Q&A forum. In each round, the GenAI ranks user-generated questions by their expected model-learning value and submits the top $M$ to the forum. The forum then applies its own selection rule $\mathcal{R}$, publishing only those questions that align with its community objectives. Feedback from the published posts informs both sides in subsequent rounds.
  • Figure 2: Relationship between question perplexity and normalized ViewCount across five StackExchange domains. Each plot reports the Spearman correlation coefficient $\rho$. A general pattern of negative correlation emerges, highlighting systematic misalignment between forum engagement and LLM uncertainty.

Theorems & Definitions (6)

  • Remark 3.1
  • Theorem 3.2
  • Theorem B.1
  • proof
  • Proposition B.2
  • proof : Proof of Proposition \ref{['prop:CCSS']}