Table of Contents
Fetching ...

Cross-Modal Learning of Housing Quality in Amsterdam

Alex Levering, Diego Marcos, Devis Tuia

TL;DR

It is found that through careful filtering and by using the right pre-trained model, Flickr image features combined with aerial image features are able to halve the performance gap to GSV features from 30% to 15%.

Abstract

In our research we test data and models for the recognition of housing quality in the city of Amsterdam from ground-level and aerial imagery. For ground-level images we compare Google StreetView (GSV) to Flickr images. Our results show that GSV predicts the most accurate building quality scores, approximately 30% better than using only aerial images. However, we find that through careful filtering and by using the right pre-trained model, Flickr image features combined with aerial image features are able to halve the performance gap to GSV features from 30% to 15%. Our results indicate that there are viable alternatives to GSV for liveability factor prediction, which is encouraging as GSV images are more difficult to acquire and not always available.

Cross-Modal Learning of Housing Quality in Amsterdam

TL;DR

It is found that through careful filtering and by using the right pre-trained model, Flickr image features combined with aerial image features are able to halve the performance gap to GSV features from 30% to 15%.

Abstract

In our research we test data and models for the recognition of housing quality in the city of Amsterdam from ground-level and aerial imagery. For ground-level images we compare Google StreetView (GSV) to Flickr images. Our results show that GSV predicts the most accurate building quality scores, approximately 30% better than using only aerial images. However, we find that through careful filtering and by using the right pre-trained model, Flickr image features combined with aerial image features are able to halve the performance gap to GSV features from 30% to 15%. Our results indicate that there are viable alternatives to GSV for liveability factor prediction, which is encouraging as GSV images are more difficult to acquire and not always available.
Paper Structure (6 sections, 2 equations, 3 figures, 3 tables)

This paper contains 6 sections, 2 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Data splits of our experiments over the city of Amsterdam. Testing squares are padded by validation cells to ensure that no test data is seen during training. Cyan squares are for training, red for validation, and blue for testing. Black points represent geotagged photos of the Flickr buildings subset.
  • Figure 2: Multimodal model predicting housing quality scores. Only the aerial branch and the merging layer shown in blue are trained. Features extracted from the ground-level images are fixed. Depending on the subset, the ground-level image branch uses features extracted from either Google StreetView, or Flickr images.
  • Figure 3: Plots of predictions of building quality score for the best model of each data subset on the two most spatially diverse tiles. Their locations are displayed in Figure \ref{['fig:splits']}. Colors range from red (low-quality) to blue (high-quality).