EMPLACE: Self-Supervised Urban Scene Change Detection
Tim Alpherts, Sennay Ghebreab, Nanne van Noord
TL;DR
EMPLACE addresses urban scene change detection by introducing AC-1M, a large-scale tri-temporal panorama dataset, and a self-supervised framework that learns change representations without labels.The method uses a Vision Transformer (ViT-B/14) initialized with DINOv2 and trained with an adaptive triplet loss, complemented by cut-and-flip augmentation to handle panoramic imagery and noise.EMPLACE demonstrates strong performance as both a pre-training method and in zero-shot change detection, outperforming state-of-the-art baselines, and its Amsterdam case study shows that small visual changes correlate with housing prices while large changes exhibit distinct patterns.Collectively, the work enables scalable, label-free USCD across cities and provides tools to link visual urban change to socio-economic indicators, advancing Visual Urban Analytics beyond static, single-image analyses.
Abstract
Urban change is a constant process that influences the perception of neighbourhoods and the lives of the people within them. The field of Urban Scene Change Detection (USCD) aims to capture changes in street scenes using computer vision and can help raise awareness of changes that make it possible to better understand the city and its residents. Traditionally, the field of USCD has used supervised methods with small scale datasets. This constrains methods when applied to new cities, as it requires labour-intensive labeling processes and forces a priori definitions of relevant change. In this paper we introduce AC-1M the largest USCD dataset by far of over 1.1M images, together with EMPLACE, a self-supervising method to train a Vision Transformer using our adaptive triplet loss. We show EMPLACE outperforms SOTA methods both as a pre-training method for linear fine-tuning as well as a zero-shot setting. Lastly, in a case study of Amsterdam, we show that we are able to detect both small and large changes throughout the city and that changes uncovered by EMPLACE, depending on size, correlate with housing prices - which in turn is indicative of inequity.
