Table of Contents
Fetching ...

HeatPrompt: Zero-Shot Vision-Language Modeling of Urban Heat Demand from Satellite Images

Kundan Thota, Xuanhao Mu, Thorsten Schlachter, Veit Hagenmeyer

TL;DR

Qualitative analysis shows that high-impact tokens align with high-demand zones, offering lightweight support for heat planning in data-scarce regions.

Abstract

Accurate heat-demand maps play a crucial role in decarbonizing space heating, yet most municipalities lack detailed building-level data needed to calculate them. We introduce HeatPrompt, a zero-shot vision-language energy modeling framework that estimates annual heat demand using semantic features extracted from satellite images, basic Geographic Information System (GIS), and building-level features. We feed pretrained Large Vision Language Models (VLMs) with a domain-specific prompt to act as an energy planner and extract the visual attributes such as roof age, building density, etc, from the RGB satellite image that correspond to the thermal load. A Multi-Layer Perceptron (MLP) regressor trained on these captions shows an $R^2$ uplift of 93.7% and shrinks the mean absolute error (MAE) by 30% compared to the baseline model. Qualitative analysis shows that high-impact tokens align with high-demand zones, offering lightweight support for heat planning in data-scarce regions.

HeatPrompt: Zero-Shot Vision-Language Modeling of Urban Heat Demand from Satellite Images

TL;DR

Qualitative analysis shows that high-impact tokens align with high-demand zones, offering lightweight support for heat planning in data-scarce regions.

Abstract

Accurate heat-demand maps play a crucial role in decarbonizing space heating, yet most municipalities lack detailed building-level data needed to calculate them. We introduce HeatPrompt, a zero-shot vision-language energy modeling framework that estimates annual heat demand using semantic features extracted from satellite images, basic Geographic Information System (GIS), and building-level features. We feed pretrained Large Vision Language Models (VLMs) with a domain-specific prompt to act as an energy planner and extract the visual attributes such as roof age, building density, etc, from the RGB satellite image that correspond to the thermal load. A Multi-Layer Perceptron (MLP) regressor trained on these captions shows an uplift of 93.7% and shrinks the mean absolute error (MAE) by 30% compared to the baseline model. Qualitative analysis shows that high-impact tokens align with high-demand zones, offering lightweight support for heat planning in data-scarce regions.
Paper Structure (15 sections, 3 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 15 sections, 3 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: Energy-relevant features such as solar-ready surfaces, green roof areas, reflective roofing, and surrounding vegetation that support heat demand estimation.
  • Figure 2: Illustration of HeatPrompt's semantic feature extraction. An isoline mask (red boundary) is first overlaid on the 512×512 px satellite image, defining the region of interest. Then, a VLM is prompted to extract visual attributes and associated confidence scores from this RGBA composite. These semantic features are then used for heat‐demand regression.
  • Figure 3: Semantic trends across heat demand terciles. Visual signals, such as tree cover and roof condition, correlate with modeled demand.
  • Figure 4: Visual comparison of similar buildings showing semantic cues influencing heat demand.