Table of Contents
Fetching ...

Deep Learning for Cross-Domain Data Fusion in Urban Computing: Taxonomy, Advances, and Outlook

Xingchen Zou, Yibo Yan, Xixuan Hao, Yuehong Hu, Haomin Wen, Erdong Liu, Junbo Zhang, Yong Li, Tianrui Li, Yu Zheng, Yuxuan Liang

TL;DR

The surveyed work addresses how deep learning enables cross-domain data fusion in urban computing by introducing a three-dimensional taxonomy of data sources, fusion methods, and applications. It foregrounds four fusion paradigms—feature-based, alignment-based, contrast-based, and generation-based—and discusses their concrete realizations (e.g., graph-based fusion, cross-modal attention, contrastive learning, diffusion, and LLM-enhanced methods). The paper highlights rich DL-driven applications across urban planning, transportation, economy, public safety, society, environment, and energy, and emphasizes the emerging role of LLMs and large multimodal models in this space. It also outlines data, methodological, and practical challenges and offers directions such as privacy-preserving learning, open benchmarks, and computationally efficient architectures to advance urban intelligence.

Abstract

As cities continue to burgeon, Urban Computing emerges as a pivotal discipline for sustainable development by harnessing the power of cross-domain data fusion from diverse sources (e.g., geographical, traffic, social media, and environmental data) and modalities (e.g., spatio-temporal, visual, and textual modalities). Recently, we are witnessing a rising trend that utilizes various deep-learning methods to facilitate cross-domain data fusion in smart cities. To this end, we propose the first survey that systematically reviews the latest advancements in deep learning-based data fusion methods tailored for urban computing. Specifically, we first delve into data perspective to comprehend the role of each modality and data source. Secondly, we classify the methodology into four primary categories: feature-based, alignment-based, contrast-based, and generation-based fusion methods. Thirdly, we further categorize multi-modal urban applications into seven types: urban planning, transportation, economy, public safety, society, environment, and energy. Compared with previous surveys, we focus more on the synergy of deep learning methods with urban computing applications. Furthermore, we shed light on the interplay between Large Language Models (LLMs) and urban computing, postulating future research directions that could revolutionize the field. We firmly believe that the taxonomy, progress, and prospects delineated in our survey stand poised to significantly enrich the research community. The summary of the comprehensive and up-to-date paper list can be found at https://github.com/yoshall/Awesome-Multimodal-Urban-Computing.

Deep Learning for Cross-Domain Data Fusion in Urban Computing: Taxonomy, Advances, and Outlook

TL;DR

The surveyed work addresses how deep learning enables cross-domain data fusion in urban computing by introducing a three-dimensional taxonomy of data sources, fusion methods, and applications. It foregrounds four fusion paradigms—feature-based, alignment-based, contrast-based, and generation-based—and discusses their concrete realizations (e.g., graph-based fusion, cross-modal attention, contrastive learning, diffusion, and LLM-enhanced methods). The paper highlights rich DL-driven applications across urban planning, transportation, economy, public safety, society, environment, and energy, and emphasizes the emerging role of LLMs and large multimodal models in this space. It also outlines data, methodological, and practical challenges and offers directions such as privacy-preserving learning, open benchmarks, and computationally efficient architectures to advance urban intelligence.

Abstract

As cities continue to burgeon, Urban Computing emerges as a pivotal discipline for sustainable development by harnessing the power of cross-domain data fusion from diverse sources (e.g., geographical, traffic, social media, and environmental data) and modalities (e.g., spatio-temporal, visual, and textual modalities). Recently, we are witnessing a rising trend that utilizes various deep-learning methods to facilitate cross-domain data fusion in smart cities. To this end, we propose the first survey that systematically reviews the latest advancements in deep learning-based data fusion methods tailored for urban computing. Specifically, we first delve into data perspective to comprehend the role of each modality and data source. Secondly, we classify the methodology into four primary categories: feature-based, alignment-based, contrast-based, and generation-based fusion methods. Thirdly, we further categorize multi-modal urban applications into seven types: urban planning, transportation, economy, public safety, society, environment, and energy. Compared with previous surveys, we focus more on the synergy of deep learning methods with urban computing applications. Furthermore, we shed light on the interplay between Large Language Models (LLMs) and urban computing, postulating future research directions that could revolutionize the field. We firmly believe that the taxonomy, progress, and prospects delineated in our survey stand poised to significantly enrich the research community. The summary of the comprehensive and up-to-date paper list can be found at https://github.com/yoshall/Awesome-Multimodal-Urban-Computing.
Paper Structure (43 sections, 13 equations, 27 figures, 3 tables)

This paper contains 43 sections, 13 equations, 27 figures, 3 tables.

Figures (27)

  • Figure 1: A sketch of cross-domain urban computing. Left: It involves the integration of urban data from diverse modalities, including spatio-temporal, visual, textual, and other modalities, through the process of data fusion. Right: Generally, these urban data derive from multiple sources, such as geographical data, transportation, social media, demography, and environment.
  • Figure 2: Times cited and publications over time for Deep Learning in Urban Computing on prestigious venues(Source: Web of Science)
  • Figure 3: The taxonomy framework for deep learning-based cross-domain data fusion in urban computing in our survey. The framework is structured around three dimensions: data, fusion method, and application. Within each perspective, we categorize existing research into different categories to provide a comprehensive and well-organized review.
  • Figure 4: Proportion of dataset type among highly related papers within the scope of cross-domain data fusion in urban computing.
  • Figure 5: Distribution of dataset usage frequency from different cities (i.e., bars) and countries (i.e., colors) across highly related papers within the scope of this survey. Note that the cities with less than four paper usage are omitted in this illustration for simplicity.
  • ...and 22 more figures