Table of Contents
Fetching ...

Climate Change from Large Language Models

Hongyin Zhu, Prayag Tiwari

TL;DR

This paper tackles the challenge of evaluating climate-crisis knowledge in large language models (LLMs). It introduces an automated evaluation framework that builds a diverse climate Q&A corpus via data synthesis and manual collection and then assesses model outputs using a structured set of $10$ metrics collected from prompts across ten perspectives. Experiments across high-performing LLMs reveal that, while LLMs harbor substantial climate knowledge, their responses suffer from timeliness gaps, underscoring the need for continual content updates. The work provides a scalable, explainable approach and lays the groundwork for online climate-crisis knowledge services that deliver real-time, expert-level Q&A to the public.

Abstract

Climate change poses grave challenges, demanding widespread understanding and low-carbon lifestyle awareness. Large language models (LLMs) offer a powerful tool to address this crisis, yet comprehensive evaluations of their climate-crisis knowledge are lacking. This paper proposes an automated evaluation framework to assess climate-crisis knowledge within LLMs. We adopt a hybrid approach for data acquisition, combining data synthesis and manual collection, to compile a diverse set of questions encompassing various aspects of climate change. Utilizing prompt engineering based on the compiled questions, we evaluate the model's knowledge by analyzing its generated answers. Furthermore, we introduce a comprehensive set of metrics to assess climate-crisis knowledge, encompassing indicators from 10 distinct perspectives. These metrics provide a multifaceted evaluation, enabling a nuanced understanding of the LLMs' climate crisis comprehension. The experimental results demonstrate the efficacy of our proposed method. In our evaluation utilizing diverse high-performing LLMs, we discovered that while LLMs possess considerable climate-related knowledge, there are shortcomings in terms of timeliness, indicating a need for continuous updating and refinement of their climate-related content.

Climate Change from Large Language Models

TL;DR

This paper tackles the challenge of evaluating climate-crisis knowledge in large language models (LLMs). It introduces an automated evaluation framework that builds a diverse climate Q&A corpus via data synthesis and manual collection and then assesses model outputs using a structured set of metrics collected from prompts across ten perspectives. Experiments across high-performing LLMs reveal that, while LLMs harbor substantial climate knowledge, their responses suffer from timeliness gaps, underscoring the need for continual content updates. The work provides a scalable, explainable approach and lays the groundwork for online climate-crisis knowledge services that deliver real-time, expert-level Q&A to the public.

Abstract

Climate change poses grave challenges, demanding widespread understanding and low-carbon lifestyle awareness. Large language models (LLMs) offer a powerful tool to address this crisis, yet comprehensive evaluations of their climate-crisis knowledge are lacking. This paper proposes an automated evaluation framework to assess climate-crisis knowledge within LLMs. We adopt a hybrid approach for data acquisition, combining data synthesis and manual collection, to compile a diverse set of questions encompassing various aspects of climate change. Utilizing prompt engineering based on the compiled questions, we evaluate the model's knowledge by analyzing its generated answers. Furthermore, we introduce a comprehensive set of metrics to assess climate-crisis knowledge, encompassing indicators from 10 distinct perspectives. These metrics provide a multifaceted evaluation, enabling a nuanced understanding of the LLMs' climate crisis comprehension. The experimental results demonstrate the efficacy of our proposed method. In our evaluation utilizing diverse high-performing LLMs, we discovered that while LLMs possess considerable climate-related knowledge, there are shortcomings in terms of timeliness, indicating a need for continuous updating and refinement of their climate-related content.
Paper Structure (17 sections, 5 equations, 4 figures, 4 tables)

This paper contains 17 sections, 5 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: The schematic diagram of the proposed climate crisis knowledge evaluation framework
  • Figure 2: An illustration of utilizing multiple LLMs to automatically evaluate a question-answer pair in the context of climate change
  • Figure 3: Visualization of question quality evaluation, with circles closer to the center indicating lower overall scores assigned by the model
  • Figure 4: Visualization of answer quality evaluation, with circles positioned closer to the center indicating lower overall scores assigned by the model