Table of Contents
Fetching ...

Self-supervised learning unveils change in urban housing from street-level images

Steven Stalder, Michele Volpi, Nicolas Büttner, Stephen Law, Kenneth Harttgen, Esra Suel

TL;DR

A novel adaptation of Barlow Twins, Street2Vec, embeds urban structure while being invariant to seasonal and daily changes without manual annotations, which can provide timely information for urban planning and policy decisions toward more liveable, equitable, and sustainable cities.

Abstract

Cities around the world face a critical shortage of affordable and decent housing. Despite its critical importance for policy, our ability to effectively monitor and track progress in urban housing is limited. Deep learning-based computer vision methods applied to street-level images have been successful in the measurement of socioeconomic and environmental inequalities but did not fully utilize temporal images to track urban change as time-varying labels are often unavailable. We used self-supervised methods to measure change in London using 15 million street images taken between 2008 and 2021. Our novel adaptation of Barlow Twins, Street2Vec, embeds urban structure while being invariant to seasonal and daily changes without manual annotations. It outperformed generic embeddings, successfully identified point-level change in London's housing supply from street-level images, and distinguished between major and minor change. This capability can provide timely information for urban planning and policy decisions toward more liveable, equitable, and sustainable cities.

Self-supervised learning unveils change in urban housing from street-level images

TL;DR

A novel adaptation of Barlow Twins, Street2Vec, embeds urban structure while being invariant to seasonal and daily changes without manual annotations, which can provide timely information for urban planning and policy decisions toward more liveable, equitable, and sustainable cities.

Abstract

Cities around the world face a critical shortage of affordable and decent housing. Despite its critical importance for policy, our ability to effectively monitor and track progress in urban housing is limited. Deep learning-based computer vision methods applied to street-level images have been successful in the measurement of socioeconomic and environmental inequalities but did not fully utilize temporal images to track urban change as time-varying labels are often unavailable. We used self-supervised methods to measure change in London using 15 million street images taken between 2008 and 2021. Our novel adaptation of Barlow Twins, Street2Vec, embeds urban structure while being invariant to seasonal and daily changes without manual annotations. It outperformed generic embeddings, successfully identified point-level change in London's housing supply from street-level images, and distinguished between major and minor change. This capability can provide timely information for urban planning and policy decisions toward more liveable, equitable, and sustainable cities.
Paper Structure (14 sections, 3 equations, 5 figures, 1 table)

This paper contains 14 sections, 3 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Overview of the full pipeline. (a) Our proposed Street2Vec method, in which we apply Barlow Twins zbontar2021barlow to street-level images. (b) Illustration of our application of the trained Street2Vec model for mapping urban change between 2008 and 2018.
  • Figure 2: Change detected from Street2Vec embeddings in London: (a) map of predicted mean change for all Middle Super Output Areas (MSOAs), (b) map of predicted mean change for Opportunity Areas (OAs) announced by the Mayor of London as areas with substantial potential for new developments, (c) distribution of point level change detected in OAs in London compared with non-opportunity areas (Non-OA) in London. In (a) and (b), darker red colors correspond to higher levels of predicted change.
  • Figure 3: (a) Histogram of cosine distances between the embeddings of images in 2008 and 2018. (b) Example image pairs with increasing cosine distances between them.
  • Figure 4: Histogram of cosine distances per class, for our model (blue) and for the baseline (orange).
  • Figure 5: (a) UMAP mcinnes2018umap-software projection space for 10,000 randomly sampled street-level images from London that have been processed by Street2Vec. We color the points in a circular manner according to their position in this space. Note that we omit axes and scales, as the absolute values of the UMAP embedding are meaningless and the plotting is purely qualitative. On the extremes of the two UMAP dimensions, we plot the three images corresponding to the minimal or maximal values, respectively. (b) The same points on the map of London. Similar colors mean that two data points are close according to the learned representations.