Table of Contents
Fetching ...

CAIRNS: Balancing Readability and Scientific Accuracy in Climate Adaptation Question Answering

Liangji Kong, Aditya Joshi, Sarvnaz Karimi

TL;DR

CAIRNS tackles the challenge of making climate adaptation knowledge both readable and scientifically grounded by coupling location-aware retrieval with structured prompting (ScholarGuide) and a kappa-weighted hybrid evaluator that aligns automated and expert judgments. It introduces a ReAct-style loop for iterative evidence gathering and verification, and demonstrates that the ScholarGuide approach yields stronger structure and verifiability while maintaining accuracy. Although data-augmented retrieval shows limited benefit at present, the framework achieves strong average performance and provides a scalable, expert-aligned evaluation method without fine-tuning. The work advances practical, region-aware climate QA with an emphasis on traceable evidence and reproducible evaluation.

Abstract

Climate adaptation strategies are proposed in response to climate change. They are practised in agriculture to sustain food production. These strategies can be found in unstructured data (for example, scientific literature from the Elsevier website) or structured (heterogeneous climate data via government APIs). We present Climate Adaptation question-answering with Improved Readability and Noted Sources (CAIRNS), a framework that enables experts -- farmer advisors -- to obtain credible preliminary answers from complex evidence sources from the web. It enhances readability and citation reliability through a structured ScholarGuide prompt and achieves robust evaluation via a consistency-weighted hybrid evaluator that leverages inter-model agreement with experts. Together, these components enable readable, verifiable, and domain-grounded question-answering without fine-tuning or reinforcement learning. Using a previously reported dataset of expert-curated question-answers, we show that CAIRNS outperforms the baselines on most of the metrics. Our thorough ablation study confirms the results on all metrics. To validate our LLM-based evaluation, we also report an analysis of correlations against human judgment.

CAIRNS: Balancing Readability and Scientific Accuracy in Climate Adaptation Question Answering

TL;DR

CAIRNS tackles the challenge of making climate adaptation knowledge both readable and scientifically grounded by coupling location-aware retrieval with structured prompting (ScholarGuide) and a kappa-weighted hybrid evaluator that aligns automated and expert judgments. It introduces a ReAct-style loop for iterative evidence gathering and verification, and demonstrates that the ScholarGuide approach yields stronger structure and verifiability while maintaining accuracy. Although data-augmented retrieval shows limited benefit at present, the framework achieves strong average performance and provides a scalable, expert-aligned evaluation method without fine-tuning. The work advances practical, region-aware climate QA with an emphasis on traceable evidence and reproducible evaluation.

Abstract

Climate adaptation strategies are proposed in response to climate change. They are practised in agriculture to sustain food production. These strategies can be found in unstructured data (for example, scientific literature from the Elsevier website) or structured (heterogeneous climate data via government APIs). We present Climate Adaptation question-answering with Improved Readability and Noted Sources (CAIRNS), a framework that enables experts -- farmer advisors -- to obtain credible preliminary answers from complex evidence sources from the web. It enhances readability and citation reliability through a structured ScholarGuide prompt and achieves robust evaluation via a consistency-weighted hybrid evaluator that leverages inter-model agreement with experts. Together, these components enable readable, verifiable, and domain-grounded question-answering without fine-tuning or reinforcement learning. Using a previously reported dataset of expert-curated question-answers, we show that CAIRNS outperforms the baselines on most of the metrics. Our thorough ablation study confirms the results on all metrics. To validate our LLM-based evaluation, we also report an analysis of correlations against human judgment.

Paper Structure

This paper contains 15 sections, 4 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Overview of the CAIRNS framework.
  • Figure 2: ScholarGuide Prompt: structured reasoning for readable and verifiable answers.
  • Figure 3: Zero-shot agreement between LLM evaluators and human experts across seven dimensions based on Cohen’s $\kappa$ with 95% confidence interval (CI).
  • Figure 4: Few-shot agreement between LLM Evaluators and Human Experts across Seven Dimensions (Cohen’s $\kappa$ with 95% CI).