Urban Region Representation Learning with Attentive Fusion
Fengze Sun, Jianzhong Qi, Yanchuan Chang, Xiaoliang Fan, Shanika Karunasekera, Egemen Tanin
TL;DR
HAFusion tackles urban region representation learning by jointly modeling multiple feature views (mobility, POIs, and land use) with a dual-feature attentive fusion (DAFusion) and a hybrid attentive feature learning (HALearning). View- and region-level attention modules capture higher-order correlations within and across views, and a memory-augmented inter-view mechanism enables efficient cross-view interactions. Empirical results on NYC, CHI, and SF across crime, check-in, and service-call predictions show consistent improvements up to 31% in $R^2$, with DAFusion also enhancing existing multi-view models by up to ~36%. The approach offers a practical, generic fusion framework for urban analytics and similar multi-view representation learning tasks.
Abstract
An increasing number of related urban data sources have brought forth novel opportunities for learning urban region representations, i.e., embeddings. The embeddings describe latent features of urban regions and enable discovering similar regions for urban planning applications. Existing methods learn an embedding for a region using every different type of region feature data, and subsequently fuse all learned embeddings of a region to generate a unified region embedding. However, these studies often overlook the significance of the fusion process. The typical fusion methods rely on simple aggregation, such as summation and concatenation, thereby disregarding correlations within the fused region embeddings. To address this limitation, we propose a novel model named HAFusion. Our model is powered by a dual-feature attentive fusion module named DAFusion, which fuses embeddings from different region features to learn higher-order correlations between the regions as well as between the different types of region features. DAFusion is generic - it can be integrated into existing models to enhance their fusion process. Further, motivated by the effective fusion capability of an attentive module, we propose a hybrid attentive feature learning module named HALearning to enhance the embedding learning from each individual type of region features. Extensive experiments on three real-world datasets demonstrate that our model HAFusion outperforms state-of-the-art methods across three different prediction tasks. Using our learned region embedding leads to consistent and up to 31% improvements in the prediction accuracy.
