Table of Contents
Fetching ...

Pixels and Predictions: Potential of GPT-4V in Meteorological Imagery Analysis and Forecast Communication

John R. Lawson, Joseph E. Trujillo-Falcón, David M. Schultz, Montgomery L. Flora, Kevin H. Goebbert, Seth N. Lyman, Corey K. Potvin, Adam J. Stepanek

TL;DR

GPT-4V’s capacity to create plausible severe-weather outlooks and communicate hazards in both Spanish and English, from inputs of weather charts and texts is evaluated, advocating for cautious AI integration, emphasizing the need for human oversight and reliable, trustworthy output.

Abstract

Generative AI, such as OpenAI's GPT-4V large-language model, has rapidly entered mainstream discourse. Novel capabilities in image processing and natural-language communication may augment existing forecasting methods. Large language models further display potential to better communicate weather hazards in a style honed for diverse communities and different languages. This study evaluates GPT-4V's ability to interpret meteorological charts and communicate weather hazards appropriately to the user, despite challenges of hallucinations, where generative AI delivers coherent, confident, but incorrect responses. We assess GPT-4V's competence via its web interface ChatGPT in two tasks: (1) generating a severe-weather outlook from weather-chart analysis and conducting self-evaluation, revealing an outlook that corresponds well with a Storm Prediction Center human-issued forecast; and (2) producing hazard summaries in Spanish and English from weather charts. Responses in Spanish, however, resemble direct (not idiomatic) translations from English to Spanish, yielding poorly translated summaries that lose critical idiomatic precision required for optimal communication. Our findings advocate for cautious integration of tools like GPT-4V in meteorology, underscoring the necessity of human oversight and development of trustworthy, explainable AI.

Pixels and Predictions: Potential of GPT-4V in Meteorological Imagery Analysis and Forecast Communication

TL;DR

GPT-4V’s capacity to create plausible severe-weather outlooks and communicate hazards in both Spanish and English, from inputs of weather charts and texts is evaluated, advocating for cautious AI integration, emphasizing the need for human oversight and reliable, trustworthy output.

Abstract

Generative AI, such as OpenAI's GPT-4V large-language model, has rapidly entered mainstream discourse. Novel capabilities in image processing and natural-language communication may augment existing forecasting methods. Large language models further display potential to better communicate weather hazards in a style honed for diverse communities and different languages. This study evaluates GPT-4V's ability to interpret meteorological charts and communicate weather hazards appropriately to the user, despite challenges of hallucinations, where generative AI delivers coherent, confident, but incorrect responses. We assess GPT-4V's competence via its web interface ChatGPT in two tasks: (1) generating a severe-weather outlook from weather-chart analysis and conducting self-evaluation, revealing an outlook that corresponds well with a Storm Prediction Center human-issued forecast; and (2) producing hazard summaries in Spanish and English from weather charts. Responses in Spanish, however, resemble direct (not idiomatic) translations from English to Spanish, yielding poorly translated summaries that lose critical idiomatic precision required for optimal communication. Our findings advocate for cautious integration of tools like GPT-4V in meteorology, underscoring the necessity of human oversight and development of trustworthy, explainable AI.
Paper Structure (11 sections, 7 figures)

This paper contains 11 sections, 7 figures.

Figures (7)

  • Figure 1: Three pairs of weather charts taken from a full set given to GPT-4V. Images reproduced with kind permission of Pivotal Weather, LLC. Geopotential height and wind speed at 300 hPa (a,b); simulated composite reflectivity (c,d); dry-bulb temperature at 850 hPa (e,f). Left column is GFS (a,c,e); right column is NAM (b,d,f)
  • Figure 2: GPT-4V response to a collection of weather charts. We subjectively highlight vague/incorrect responses in red and useful/correct sections in blue.
  • Figure 3: Conversation snippet with (a) MUCAPE maps and request, (b) response. We have removed discussion of maps not shown in Fig. \ref{['fig:spc_maps_1']}.
  • Figure 4: Conversation snippet showing GPT-4V outlook and corresponding SPC-issued outlook for same period.
  • Figure 5: "Wisdom of crowds" method of self-evaluation of outlook in Fig. \ref{['fig:spc_issued']}a having been provided the human equivalent in Fig. \ref{['fig:spc_issued']}b.
  • ...and 2 more figures