Table of Contents
Fetching ...

Detection and Measurement of Hailstones with Multimodal Large Language Models

Moritz Alker, David C. Schedl, Andreas Stöckl

TL;DR

The paper addresses measuring hailstone sizes from crowd-sourced social-media imagery using pretrained multimodal large language models. It evaluates four LLMs with direct and two-stage prompting, leveraging reference objects and contextual cues to infer diameter. The best result—GPT-4o with two-stage prompting—achieves a mean absolute error of about $1.12$ cm and a correlation of $r ≈ 0.71$, with the two-stage approach reducing errors by roughly $18.6\%$ and dramatically lowering missed estimates. These findings suggest that off-the-shelf multimodal models can complement traditional hail sensors, enabling faster, higher-density assessments for nowcasting, though automated real-time image harvesting remains a future requirement.

Abstract

This study examines the use of social media and news images to detect and measure hailstones, utilizing pre-trained multimodal large language models. The dataset for this study comprises 474 crowdsourced images of hailstones from documented hail events in Austria, which occurred between January 2022 and September 2024. These hailstones have maximum diameters ranging from 2 to 11cm. We estimate the hail diameters and compare four different models utilizing one-stage and two-stage prompting strategies. The latter utilizes additional size cues from reference objects, such as human hands, within the image. Our results show that pretrained models already have the potential to measure hailstone diameters from images with an average mean absolute error of 1.12cm for the best model. In comparison to a single-stage prompt, two-stage prompting improves the reliability of most models. Our study suggests that these off-the-shelf models, even without fine-tuning, can complement traditional hail sensors by extracting meaningful and spatially dense information from social media imagery, enabling faster and more detailed assessments of severe weather events. The automated real-time image harvesting from social media and other sources remains an open task, but it will make our approach directly applicable to future hail events.

Detection and Measurement of Hailstones with Multimodal Large Language Models

TL;DR

The paper addresses measuring hailstone sizes from crowd-sourced social-media imagery using pretrained multimodal large language models. It evaluates four LLMs with direct and two-stage prompting, leveraging reference objects and contextual cues to infer diameter. The best result—GPT-4o with two-stage prompting—achieves a mean absolute error of about cm and a correlation of , with the two-stage approach reducing errors by roughly and dramatically lowering missed estimates. These findings suggest that off-the-shelf multimodal models can complement traditional hail sensors, enabling faster, higher-density assessments for nowcasting, though automated real-time image harvesting remains a future requirement.

Abstract

This study examines the use of social media and news images to detect and measure hailstones, utilizing pre-trained multimodal large language models. The dataset for this study comprises 474 crowdsourced images of hailstones from documented hail events in Austria, which occurred between January 2022 and September 2024. These hailstones have maximum diameters ranging from 2 to 11cm. We estimate the hail diameters and compare four different models utilizing one-stage and two-stage prompting strategies. The latter utilizes additional size cues from reference objects, such as human hands, within the image. Our results show that pretrained models already have the potential to measure hailstone diameters from images with an average mean absolute error of 1.12cm for the best model. In comparison to a single-stage prompt, two-stage prompting improves the reliability of most models. Our study suggests that these off-the-shelf models, even without fine-tuning, can complement traditional hail sensors by extracting meaningful and spatially dense information from social media imagery, enabling faster and more detailed assessments of severe weather events. The automated real-time image harvesting from social media and other sources remains an open task, but it will make our approach directly applicable to future hail events.

Paper Structure

This paper contains 12 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Examples of hailstone images from the dataset showing different viewing distances and reference objects.
  • Figure 2: Histogram of 474 ground-truth hailstone diameters in our dataset. Differing colors indicate the distribution of close-up hailstones and hail in the distance.
  • Figure 3: Histogram showing the number of misses per model and prompt.
  • Figure 4: Ground-truth versus G4 P2 estimates. The dashed line denotes perfect agreement.
  • Figure 5: Mean-absolute error (MAE) of G4 P2 grouped by reference object.