HeatPrompt: Zero-Shot Vision-Language Modeling of Urban Heat Demand from Satellite Images

Kundan Thota; Xuanhao Mu; Thorsten Schlachter; Veit Hagenmeyer

HeatPrompt: Zero-Shot Vision-Language Modeling of Urban Heat Demand from Satellite Images

Kundan Thota, Xuanhao Mu, Thorsten Schlachter, Veit Hagenmeyer

TL;DR

Qualitative analysis shows that high-impact tokens align with high-demand zones, offering lightweight support for heat planning in data-scarce regions.

Abstract

Accurate heat-demand maps play a crucial role in decarbonizing space heating, yet most municipalities lack detailed building-level data needed to calculate them. We introduce HeatPrompt, a zero-shot vision-language energy modeling framework that estimates annual heat demand using semantic features extracted from satellite images, basic Geographic Information System (GIS), and building-level features. We feed pretrained Large Vision Language Models (VLMs) with a domain-specific prompt to act as an energy planner and extract the visual attributes such as roof age, building density, etc, from the RGB satellite image that correspond to the thermal load. A Multi-Layer Perceptron (MLP) regressor trained on these captions shows an $R^2$ uplift of 93.7% and shrinks the mean absolute error (MAE) by 30% compared to the baseline model. Qualitative analysis shows that high-impact tokens align with high-demand zones, offering lightweight support for heat planning in data-scarce regions.

HeatPrompt: Zero-Shot Vision-Language Modeling of Urban Heat Demand from Satellite Images

TL;DR

Qualitative analysis shows that high-impact tokens align with high-demand zones, offering lightweight support for heat planning in data-scarce regions.

Abstract

uplift of 93.7% and shrinks the mean absolute error (MAE) by 30% compared to the baseline model. Qualitative analysis shows that high-impact tokens align with high-demand zones, offering lightweight support for heat planning in data-scarce regions.

Paper Structure (15 sections, 3 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 15 sections, 3 equations, 4 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Heat Demand Mapping and Simulation Approaches
Machine Learning for Energy Demand Prediction
Remote Sensing and VLMs in Urban Analytics
Proposed Method
Dataset Acquisition
HeatPrompt: Semantic Feature Extraction
Regression on Semantic Embeddings
Results and Discussion
Structured Feature Regression Performance
Impact of Semantic Features from Vision Models
Semantic Trends and Visual Interpretability
Qualitative Comparison: Role of Visual Cues
Conclusion

Figures (4)

Figure 1: Energy-relevant features such as solar-ready surfaces, green roof areas, reflective roofing, and surrounding vegetation that support heat demand estimation.
Figure 2: Illustration of HeatPrompt's semantic feature extraction. An isoline mask (red boundary) is first overlaid on the 512×512 px satellite image, defining the region of interest. Then, a VLM is prompted to extract visual attributes and associated confidence scores from this RGBA composite. These semantic features are then used for heat‐demand regression.
Figure 3: Semantic trends across heat demand terciles. Visual signals, such as tree cover and roof condition, correlate with modeled demand.
Figure 4: Visual comparison of similar buildings showing semantic cues influencing heat demand.

HeatPrompt: Zero-Shot Vision-Language Modeling of Urban Heat Demand from Satellite Images

TL;DR

Abstract

HeatPrompt: Zero-Shot Vision-Language Modeling of Urban Heat Demand from Satellite Images

Authors

TL;DR

Abstract

Table of Contents

Figures (4)