CrossVIT-augmented Geospatial-Intelligence Visualization System for Tracking Economic Development Dynamics
Yanbing Bai, Jinhua Su, Bin Qiao, Xiaoran Ma
TL;DR
Senseconomic tackles the challenge of producing timely, high-resolution geospatial economic indicators by fusing satellite and street-view imagery with nighttime-light proxies through a Vision Transformer-based cross-attention framework. The system is deployed with scalable Spark-based distributed computing and a Vue-based frontend for interactive county-level visualization, enabling efficient data processing and decision-support for policymakers. The authors demonstrate an $R^2$ of $0.8363$ for county-level economic proxies and report substantial speedups (e.g., 23 minutes in some cases) using distributed computing, highlighting practical implications for rapid economic monitoring. Overall, the work contributes a end-to-end, multimodal, scalable pipeline for geospatial economic analysis with tangible policy-relevant visualization capabilities.
Abstract
Timely and accurate economic data is crucial for effective policymaking. Current challenges in data timeliness and spatial resolution can be addressed with advancements in multimodal sensing and distributed computing. We introduce Senseconomic, a scalable system for tracking economic dynamics via multimodal imagery and deep learning. Built on the Transformer framework, it integrates remote sensing and street view images using cross-attention, with nighttime light data as weak supervision. The system achieved an R-squared value of 0.8363 in county-level economic predictions and halved processing time to 23 minutes using distributed computing. Its user-friendly design includes a Vue3-based front end with Baidu maps for visualization and a Python-based back end automating tasks like image downloads and preprocessing. Senseconomic empowers policymakers and researchers with efficient tools for resource allocation and economic planning.
