Deep Learning for Cross-Domain Data Fusion in Urban Computing: Taxonomy, Advances, and Outlook
Xingchen Zou, Yibo Yan, Xixuan Hao, Yuehong Hu, Haomin Wen, Erdong Liu, Junbo Zhang, Yong Li, Tianrui Li, Yu Zheng, Yuxuan Liang
TL;DR
The surveyed work addresses how deep learning enables cross-domain data fusion in urban computing by introducing a three-dimensional taxonomy of data sources, fusion methods, and applications. It foregrounds four fusion paradigms—feature-based, alignment-based, contrast-based, and generation-based—and discusses their concrete realizations (e.g., graph-based fusion, cross-modal attention, contrastive learning, diffusion, and LLM-enhanced methods). The paper highlights rich DL-driven applications across urban planning, transportation, economy, public safety, society, environment, and energy, and emphasizes the emerging role of LLMs and large multimodal models in this space. It also outlines data, methodological, and practical challenges and offers directions such as privacy-preserving learning, open benchmarks, and computationally efficient architectures to advance urban intelligence.
Abstract
As cities continue to burgeon, Urban Computing emerges as a pivotal discipline for sustainable development by harnessing the power of cross-domain data fusion from diverse sources (e.g., geographical, traffic, social media, and environmental data) and modalities (e.g., spatio-temporal, visual, and textual modalities). Recently, we are witnessing a rising trend that utilizes various deep-learning methods to facilitate cross-domain data fusion in smart cities. To this end, we propose the first survey that systematically reviews the latest advancements in deep learning-based data fusion methods tailored for urban computing. Specifically, we first delve into data perspective to comprehend the role of each modality and data source. Secondly, we classify the methodology into four primary categories: feature-based, alignment-based, contrast-based, and generation-based fusion methods. Thirdly, we further categorize multi-modal urban applications into seven types: urban planning, transportation, economy, public safety, society, environment, and energy. Compared with previous surveys, we focus more on the synergy of deep learning methods with urban computing applications. Furthermore, we shed light on the interplay between Large Language Models (LLMs) and urban computing, postulating future research directions that could revolutionize the field. We firmly believe that the taxonomy, progress, and prospects delineated in our survey stand poised to significantly enrich the research community. The summary of the comprehensive and up-to-date paper list can be found at https://github.com/yoshall/Awesome-Multimodal-Urban-Computing.
