Table of Contents
Fetching ...

LLM Benchmark-User Need Misalignment for Climate Change

Oucheng Liu, Lexing Xie, Jing Jiang

Abstract

Climate change is a major socio-scientific issue shapes public decision-making and policy discussions. As large language models (LLMs) increasingly serve as an interface for accessing climate knowledge, whether existing benchmarks reflect user needs is critical for evaluating LLM in real-world settings. We propose a Proactive Knowledge Behaviors Framework that captures the different human-human and human-AI knowledge seeking and provision behaviors. We further develop a Topic-Intent-Form taxonomy and apply it to analyze climate-related data representing different knowledge behaviors. Our results reveal a substantial mismatch between current benchmarks and real-world user needs, while knowledge interaction patterns between humans and LLMs closely resemble those in human-human interactions. These findings provide actionable guidance for benchmark design, RAG system development, and LLM training. Code is available at https://github.com/OuchengLiu/LLM-Misalign-Climate-Change.

LLM Benchmark-User Need Misalignment for Climate Change

Abstract

Climate change is a major socio-scientific issue shapes public decision-making and policy discussions. As large language models (LLMs) increasingly serve as an interface for accessing climate knowledge, whether existing benchmarks reflect user needs is critical for evaluating LLM in real-world settings. We propose a Proactive Knowledge Behaviors Framework that captures the different human-human and human-AI knowledge seeking and provision behaviors. We further develop a Topic-Intent-Form taxonomy and apply it to analyze climate-related data representing different knowledge behaviors. Our results reveal a substantial mismatch between current benchmarks and real-world user needs, while knowledge interaction patterns between humans and LLMs closely resemble those in human-human interactions. These findings provide actionable guidance for benchmark design, RAG system development, and LLM training. Code is available at https://github.com/OuchengLiu/LLM-Misalign-Climate-Change.

Paper Structure

This paper contains 46 sections, 2 equations, 31 figures, 14 tables.

Figures (31)

  • Figure 1: Our Proactive Knowledge Behaviors Framework. The proactive knowledge behaviors between three key actors are shown as blue arrows and the red arrows reflect our analytical logic.
  • Figure 2: Pipelines for data annotation in Topic Identification and Question Type Classification.
  • Figure 3: Topic comparison between Human-to-AI Queries and Human-to-AI Guidance Knowledge. (a) Pairwise topic-distribution similarities across the five datasets; (b) Probabilities of the 10 most diverging topics, i.e., those topics with the highest absolute probability differences under the two groups; (c) Topic-distribution similarities between each dataset and each group; (d) Probability differences of the most diverging topics. The interpretations and presentations of Figures \ref{['fig:rq1_intent']} and \ref{['fig:rq1_form']} follow the same logic.
  • Figure 4: User intents comparison between Human-to-AI Queries and Human-to-AI Guidance Knowledge.
  • Figure 5: Expected answer forms comparison between Human-to-AI Queries and Human-to-AI Guidance Knowledge.
  • ...and 26 more figures