Solar PV Installation Potential Assessment on Building Facades Based on Vision and Language Foundation Models
Ruyu Liu, Dongxu Zhuang, Jianhua Zhang, Arega Getaneh Abate, Per Sieverts Nielsen, Ben Wang, Xiufeng Liu
TL;DR
SF-SPA introduces a four-stage pipeline to quantify building-façade PV potential from a single street-view image, leveraging semantic rectification, zero-shot façade parsing, and LLM-guided spatial reasoning to output installable PV layouts and energy yields. The approach combines geometry, vision-language models, and energy simulation (pvlib) to produce metrically valid PV layouts without 3D data or domain-specific training, validated on 80 buildings across four countries with an average area error of 6.2% (±2.8%) and ~100 s per building. Key contributions include: (i) semantics-guided geometric rectification using semantic keypoints, (ii) zero-shot façade parsing with vision-language models, (iii) a structured prompt chain for LLM-based PV layout reasoning, and (iv) pvlib-based irradiance and energy simulations incorporating weather data and module parameters. The framework demonstrates practical utility for urban energy planning and BIPV deployment, enabling scalable, city-wide façade PV screening, while acknowledging limitations related to 2D data, rectification failures, and LLM latency, with future work targeting 3D data integration and automated scaling.
Abstract
Building facades represent a significant untapped resource for solar energy generation in dense urban environments, yet assessing their photovoltaic (PV) potential remains challenging due to complex geometries and semantic com ponents. This study introduces SF-SPA (Semantic Facade Solar-PV Assessment), an automated framework that transforms street-view photographs into quantitative PV deployment assessments. The approach combines com puter vision and artificial intelligence techniques to address three key challenges: perspective distortion correction, semantic understanding of facade elements, and spatial reasoning for PV layout optimization. Our four-stage pipeline processes images through geometric rectification, zero-shot semantic segmentation, Large Language Model (LLM) guided spatial reasoning, and energy simulation. Validation across 80 buildings in four countries demonstrates ro bust performance with mean area estimation errors of 6.2% ± 2.8% compared to expert annotations. The auto mated assessment requires approximately 100 seconds per building, a substantial gain in efficiency over manual methods. Simulated energy yield predictions confirm the method's reliability and applicability for regional poten tial studies, urban energy planning, and building-integrated photovoltaic (BIPV) deployment. Code is available at: https:github.com/CodeAXu/Solar-PV-Installation
