Table of Contents
Fetching ...

Major TOM: Expandable Datasets for Earth Observation

Alistair Francis, Mikolaj Czerkawski

TL;DR

This work presents a large, open-access dataset, MajorTOM-Core, which covers the vast majority of the Earth’s land surface and consists of a geographical indexing system based on a set of grid points and a metadata structure that allows multiple datasets with different sources to be merged.

Abstract

Deep learning models are increasingly data-hungry, requiring significant resources to collect and compile the datasets needed to train them, with Earth Observation (EO) models being no exception. However, the landscape of datasets in EO is relatively atomised, with interoperability made difficult by diverse formats and data structures. If ever larger datasets are to be built, and duplication of effort minimised, then a shared framework that allows users to combine and access multiple datasets is needed. Here, Major TOM (Terrestrial Observation Metaset) is proposed as this extensible framework. Primarily, it consists of a geographical indexing system based on a set of grid points and a metadata structure that allows multiple datasets with different sources to be merged. Besides the specification of Major TOM as a framework, this work also presents a large, open-access dataset, MajorTOM-Core, which covers the vast majority of the Earth's land surface. This dataset provides the community with both an immediately useful resource, as well as acting as a template for future additions to the Major TOM ecosystem. Access: https://huggingface.co/Major-TOM

Major TOM: Expandable Datasets for Earth Observation

TL;DR

This work presents a large, open-access dataset, MajorTOM-Core, which covers the vast majority of the Earth’s land surface and consists of a geographical indexing system based on a set of grid points and a metadata structure that allows multiple datasets with different sources to be merged.

Abstract

Deep learning models are increasingly data-hungry, requiring significant resources to collect and compile the datasets needed to train them, with Earth Observation (EO) models being no exception. However, the landscape of datasets in EO is relatively atomised, with interoperability made difficult by diverse formats and data structures. If ever larger datasets are to be built, and duplication of effort minimised, then a shared framework that allows users to combine and access multiple datasets is needed. Here, Major TOM (Terrestrial Observation Metaset) is proposed as this extensible framework. Primarily, it consists of a geographical indexing system based on a set of grid points and a metadata structure that allows multiple datasets with different sources to be merged. Besides the specification of Major TOM as a framework, this work also presents a large, open-access dataset, MajorTOM-Core, which covers the vast majority of the Earth's land surface. This dataset provides the community with both an immediately useful resource, as well as acting as a template for future additions to the Major TOM ecosystem. Access: https://huggingface.co/Major-TOM
Paper Structure (10 sections, 5 equations, 2 figures, 1 table)

This paper contains 10 sections, 5 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Major TOM paves the way for a standardized definition of AI-oriented datasets by relying on a specific grid standard. As an example, the MajorTOM-Core dataset delivers over 2 trillion pixels of Sentinel-2 data in total, spanning across nearly every piece of land captured by Sentinel-2.
  • Figure 2: An example of a Major TOM grid cell over Crete, with a Sentinel-2 sample. In this case, the sample is held in a projection, and overlaps slightly with other nearby grid cells.