Hierarchical AI-Meteorologist: LLM-Agent System for Multi-Scale and Explainable Weather Forecast Reporting
Daniil Sukhorukov, Andrei Zakharov, Nikita Glazkov, Katsiaryna Yanchanka, Vladimir Kirilin, Maxim Dubovitsky, Roman Sultimov, Yuri Maksimov, Ilya Makarov
TL;DR
The paper introduces the Hierarchical AI-Meteorologist, an LLM-agent pipeline that interprets forecast tables at hourly, 6-hourly, and daily scales to produce coherent, verifiable weather reports. It combines a structured four-part output (summary, proof, keywords, warnings) with a keyword-based validation layer, anchored to data through a proof block. Using OpenWeather and Meteostat, the approach demonstrates improved interpretability and robustness across diverse climates, supported by a RESTful, reproducible architecture and two-stage reasoning (Meteorologist then Writer). The method advances explainable meteorological reporting by delivering cross-scale narratives that are verifiable against aggregates and suitable for domain-specific decision support. Practically, it offers a reproducible framework for semantic evaluation and a scalable path toward operational, self-checking weather reporting across diverse locations.
Abstract
We present the Hierarchical AI-Meteorologist, an LLM-agent system that generates explainable weather reports using a hierarchical forecast reasoning and weather keyword generation. Unlike standard approaches that treat forecasts as flat time series, our framework performs multi-scale reasoning across hourly, 6-hour, and daily aggregations to capture both short-term dynamics and long-term trends. Its core reasoning agent converts structured meteorological inputs into coherent narratives while simultaneously extracting a few keywords effectively summarizing the dominant meteorological events. These keywords serve as semantic anchors for validating consistency, temporal coherence and factual alignment of the generated reports. Using OpenWeather and Meteostat data, we demonstrate that hierarchical context and keyword-based validation substantially improve interpretability and robustness of LLM-generated weather narratives, offering a reproducible framework for semantic evaluation of automated meteorological reporting and advancing agent-based scientific reasoning.
