Contrastive Pretraining for Visual Concept Explanations of Socioeconomic Outcomes
Ivica Obadic, Alex Levering, Lars Pennig, Dario Oliveira, Diego Marcos, Xiaoxiang Zhu
TL;DR
This work tackles the opacity of deep models predicting socioeconomic indicators from satellite imagery by proposing a post-hoc concept-explanation pipeline that orders latent representations by the target outcome using Rank-N-Contrast (RNC) pretraining, followed by a linear regressor and TCAV-based concept testing. The method yields a latent space that is continuously ordered with respect to the outcome, enabling concept explanations that cluster by outcome intervals and revealing urban patterns associated with different socioeconomic levels. On two geographies/tasks (income in France and liveability in the Netherlands), the approach improves predictive performance for income by about $0.10$ in $R^2$ and $0.08$ in Kendall's $\tau$, while providing interpretable insights into which concepts drive different outcome ranges. Crucially, it does not require location-specific concept labels, enabling cross-region applicability and providing urban-planning insights through concept sensitivities, such as the role of vegetation in higher-income or higher-liveability areas.
Abstract
Predicting socioeconomic indicators from satellite imagery with deep learning has become an increasingly popular research direction. Post-hoc concept-based explanations can be an important step towards broader adoption of these models in policy-making as they enable the interpretation of socioeconomic outcomes based on visual concepts that are intuitive to humans. In this paper, we study the interplay between representation learning using an additional task-specific contrastive loss and post-hoc concept explainability for socioeconomic studies. Our results on two different geographical locations and tasks indicate that the task-specific pretraining imposes a continuous ordering of the latent space embeddings according to the socioeconomic outcomes. This improves the model's interpretability as it enables the latent space of the model to associate concepts encoding typical urban and natural area patterns with continuous intervals of socioeconomic outcomes. Further, we illustrate how analyzing the model's conceptual sensitivity for the intervals of socioeconomic outcomes can shed light on new insights for urban studies.
