Table of Contents
Fetching ...

CureGraph: Contrastive Multi-Modal Graph Representation Learning for Urban Living Circle Health Profiling and Prediction

Jinlin Li, Xiao Zhou

TL;DR

CureGraph tackles elderly health profiling in urban environments by learning low-dimensional, spatially aware embeddings from multi-modal data within 15-minute living circles. It combines visual, textual, and POI-derived textual information through modality-specific encoders and a spatially informed graph (SMGCN) to model cross-modal and local spatial dependencies, guided by a spatial autocorrelation matrix. The model achieves substantial gains in predicting four geriatric diseases (MCI, hypertension, diabetes, MDD) across Beijing and Shanghai, with average $R^2$ improvements around $28\%$ and strong performance at multiple spatial scales. The approach provides actionable urban-health insights and public policy support, with code available at https://github.com/jinlin2021/CureGraph.

Abstract

The early detection and prediction of health status decline among the elderly at the neighborhood level are of great significance for urban planning and public health policymaking. While existing studies affirm the connection between living environments and health outcomes, most rely on single data modalities or simplistic feature concatenation of multi-modal information, limiting their ability to comprehensively profile the health-oriented urban environments. To fill this gap, we propose CureGraph, a contrastive multi-modal representation learning framework for urban health prediction that employs graph-based techniques to infer the prevalence of common chronic diseases among the elderly within the urban living circles of each neighborhood. CureGraph leverages rich multi-modal information, including photos and textual reviews of residential areas and their surrounding points of interest, to generate urban neighborhood embeddings. By integrating pre-trained visual and textual encoders with graph modeling techniques, CureGraph captures cross-modal spatial dependencies, offering a comprehensive understanding of urban environments tailored to elderly health considerations. Extensive experiments on real-world datasets demonstrate that CureGraph improves the best baseline by $28\%$ on average in terms of $R^2$ across elderly disease risk prediction tasks. Moreover, the model enables the identification of stage-wise chronic disease progression and supports comparative public health analysis across neighborhoods, offering actionable insights for sustainable urban development and enhanced quality of life. The code is publicly available at https://github.com/jinlin2021/CureGraph.

CureGraph: Contrastive Multi-Modal Graph Representation Learning for Urban Living Circle Health Profiling and Prediction

TL;DR

CureGraph tackles elderly health profiling in urban environments by learning low-dimensional, spatially aware embeddings from multi-modal data within 15-minute living circles. It combines visual, textual, and POI-derived textual information through modality-specific encoders and a spatially informed graph (SMGCN) to model cross-modal and local spatial dependencies, guided by a spatial autocorrelation matrix. The model achieves substantial gains in predicting four geriatric diseases (MCI, hypertension, diabetes, MDD) across Beijing and Shanghai, with average improvements around and strong performance at multiple spatial scales. The approach provides actionable urban-health insights and public policy support, with code available at https://github.com/jinlin2021/CureGraph.

Abstract

The early detection and prediction of health status decline among the elderly at the neighborhood level are of great significance for urban planning and public health policymaking. While existing studies affirm the connection between living environments and health outcomes, most rely on single data modalities or simplistic feature concatenation of multi-modal information, limiting their ability to comprehensively profile the health-oriented urban environments. To fill this gap, we propose CureGraph, a contrastive multi-modal representation learning framework for urban health prediction that employs graph-based techniques to infer the prevalence of common chronic diseases among the elderly within the urban living circles of each neighborhood. CureGraph leverages rich multi-modal information, including photos and textual reviews of residential areas and their surrounding points of interest, to generate urban neighborhood embeddings. By integrating pre-trained visual and textual encoders with graph modeling techniques, CureGraph captures cross-modal spatial dependencies, offering a comprehensive understanding of urban environments tailored to elderly health considerations. Extensive experiments on real-world datasets demonstrate that CureGraph improves the best baseline by on average in terms of across elderly disease risk prediction tasks. Moreover, the model enables the identification of stage-wise chronic disease progression and supports comparative public health analysis across neighborhoods, offering actionable insights for sustainable urban development and enhanced quality of life. The code is publicly available at https://github.com/jinlin2021/CureGraph.
Paper Structure (32 sections, 20 equations, 9 figures, 8 tables, 1 algorithm)

This paper contains 32 sections, 20 equations, 9 figures, 8 tables, 1 algorithm.

Figures (9)

  • Figure 1: Multi-modal data of urban community 15-minute living circles.
  • Figure 2: Framework overview of the proposed CureGraph. The visual encoder leverages self-supervised contrastive learning with negative sampling to extract image features. The POI semantic encoder uses supervised contrastive learning to map POI review texts with the same sentiment label closer in the feature space. Additionally, the semantic augment encoder employs cross-modal contrastive learning techniques, integrating visual features to enrich textual semantics. The spatial multimodal graph network (SMGCN) is constructed by incorporating both modality similarity and spatial correlation. Finally, the multimodal embeddings are processed through a multilayer perceptron (MLP) for downstream elderly health prediction tasks. This framework aims to provide robust health predictions by effectively capturing multimodal and spatial interactions.
  • Figure 3: Distribution of POI categories and the number of reviews across the datasets.
  • Figure 4: Ablation study results for $R^2$ in community elder health prediction.
  • Figure 5: PCA-based visualization of urban multi-modal embeddings.
  • ...and 4 more figures