Table of Contents
Fetching ...

Functional Map of the World

Gordon Christie, Neil Fendley, James Wilson, Ryan Mukherjee

TL;DR

The paper introduces the Functional Map of the World (fMoW), a large-scale remote-sensing dataset with over 1 million temporally stacked images, 4/8-band multispectral data, and rich metadata, annotated with 63 categories (including a 'false detection'). It explores joint reasoning over temporal image sequences and metadata using CNN- and LSTM-based baselines, demonstrating that metadata fusion and temporal context improve classification beyond image information alone. The dataset is collected via a three-phase pipeline combining VGI-derived locations and GeoHIVE crowdsourcing, and exists in two variants (fMoW-full and fMoW-rgb) to balance completeness and size. The authors publicize the data, code, and pretrained models, discuss geographic and labeling biases, and highlight potential humanitarian applications such as disaster response, setting a benchmark for multimodal, temporally-aware remote sensing research.

Abstract

We present a new dataset, Functional Map of the World (fMoW), which aims to inspire the development of machine learning models capable of predicting the functional purpose of buildings and land use from temporal sequences of satellite images and a rich set of metadata features. The metadata provided with each image enables reasoning about location, time, sun angles, physical sizes, and other features when making predictions about objects in the image. Our dataset consists of over 1 million images from over 200 countries. For each image, we provide at least one bounding box annotation containing one of 63 categories, including a "false detection" category. We present an analysis of the dataset along with baseline approaches that reason about metadata and temporal views. Our data, code, and pretrained models have been made publicly available.

Functional Map of the World

TL;DR

The paper introduces the Functional Map of the World (fMoW), a large-scale remote-sensing dataset with over 1 million temporally stacked images, 4/8-band multispectral data, and rich metadata, annotated with 63 categories (including a 'false detection'). It explores joint reasoning over temporal image sequences and metadata using CNN- and LSTM-based baselines, demonstrating that metadata fusion and temporal context improve classification beyond image information alone. The dataset is collected via a three-phase pipeline combining VGI-derived locations and GeoHIVE crowdsourcing, and exists in two variants (fMoW-full and fMoW-rgb) to balance completeness and size. The authors publicize the data, code, and pretrained models, discuss geographic and labeling biases, and highlight potential humanitarian applications such as disaster response, setting a benchmark for multimodal, temporally-aware remote sensing research.

Abstract

We present a new dataset, Functional Map of the World (fMoW), which aims to inspire the development of machine learning models capable of predicting the functional purpose of buildings and land use from temporal sequences of satellite images and a rich set of metadata features. The metadata provided with each image enables reasoning about location, time, sun angles, physical sizes, and other features when making predictions about objects in the image. Our dataset consists of over 1 million images from over 200 countries. For each image, we provide at least one bounding box annotation containing one of 63 categories, including a "false detection" category. We present an analysis of the dataset along with baseline approaches that reason about metadata and temporal views. Our data, code, and pretrained models have been made publicly available.

Paper Structure

This paper contains 12 sections, 13 figures, 4 tables.

Figures (13)

  • Figure 1: In fMoW, temporal sequences of images, multispectral imagery, metadata, and bounding boxes are provided. In this example, if we only look inside the yellow box in the right image, we will only see a road and vegetation. On the other hand, if we only see the water in the left image, then we will potentially predict this to be a lake. However, by observing both views of this area, we can now reason that this sequence contains a flooded road.
  • Figure 2: Sample image of what a GeoHIVE user might see while validating potential fMoW dataset features. Instructions can be seen in the top-left corner that inform users to press the '1', '2', or '3' keys to validate existence, non-existence, or cloud obscuration of a particular object.
  • Figure 3: This shows the total number of instances for each category (including FD) in fMoW across different number of bands. These numbers include the temporal views of the same areas. fMoW-full consists of 3 band imagery (pan-sharpened RGB), as well as 4 and 8 band imagery. In fMoW-rgb, the RGB channels of the 4 and 8 band imagery are extracted and saved as JPEG images.
  • Figure 4: This shows the distribution of the number of temporal views in our dataset. The number of temporal views is not incremented by both the pan-sharpened and multispectral images. These images have almost identical metadata files and are therefore not counted twice. The maximum number of temporal views for any area in the dataset is 41.
  • Figure 5: This shows the geographic diversity of fMoW. Data was collected from over 400 unique UTM zones (including latitude bands). This helps illustrate the number of images captured in each UTM zone, where more green colors show UTM zones with a higher number of instances, and more blue colors show UTM zones with lower counts.
  • ...and 8 more figures