Table of Contents
Fetching ...

Multiple Random Masking Autoencoder Ensembles for Robust Multimodal Semi-supervised Learning

Alexandru-Raul Todoran, Marius Leordeanu

TL;DR

The paper tackles learning across multiple data modalities with missing observations by introducing MR-MAE, a framework that uses multiple random masking of features to train a flexible, task-agnostic predictor that implicitly forms a large ensemble of input–output mappings. It adds an automatic feature-importance mechanism via a Loss Matrix and enables semi-supervised learning through ensemble-based pseudo-labels. The authors validate MR-MAE on NASA's Earth Observation NEO dataset with 19 layers, demonstrating robustness to missing data, competitive performance against a multi-task hyper-graph model, and clear advantages in feature interpretation and climate insight discovery. The work suggests practical climate science applications and points to future improvements by incorporating stronger backbones such as Transformer architectures.

Abstract

There is an increasing number of real-world problems in computer vision and machine learning requiring to take into consideration multiple interpretation layers (modalities or views) of the world and learn how they relate to each other. For example, in the case of Earth Observations from satellite data, it is important to be able to predict one observation layer (e.g. vegetation index) from other layers (e.g. water vapor, snow cover, temperature etc), in order to best understand how the Earth System functions and also be able to reliably predict information for one layer when the data is missing (e.g. due to measurement failure or error).

Multiple Random Masking Autoencoder Ensembles for Robust Multimodal Semi-supervised Learning

TL;DR

The paper tackles learning across multiple data modalities with missing observations by introducing MR-MAE, a framework that uses multiple random masking of features to train a flexible, task-agnostic predictor that implicitly forms a large ensemble of input–output mappings. It adds an automatic feature-importance mechanism via a Loss Matrix and enables semi-supervised learning through ensemble-based pseudo-labels. The authors validate MR-MAE on NASA's Earth Observation NEO dataset with 19 layers, demonstrating robustness to missing data, competitive performance against a multi-task hyper-graph model, and clear advantages in feature interpretation and climate insight discovery. The work suggests practical climate science applications and points to future improvements by incorporating stronger backbones such as Transformer architectures.

Abstract

There is an increasing number of real-world problems in computer vision and machine learning requiring to take into consideration multiple interpretation layers (modalities or views) of the world and learn how they relate to each other. For example, in the case of Earth Observations from satellite data, it is important to be able to predict one observation layer (e.g. vegetation index) from other layers (e.g. water vapor, snow cover, temperature etc), in order to best understand how the Earth System functions and also be able to reliably predict information for one layer when the data is missing (e.g. due to measurement failure or error).
Paper Structure (15 sections, 11 figures, 1 table)

This paper contains 15 sections, 11 figures, 1 table.

Figures (11)

  • Figure 1: One iteration of the learning process for training the MAE model
  • Figure 2: Our method of automatically estimating feature importance by computing the Loss Matrix. The figure shows how one prediction case translates to changes in the matrix. In particular, one pair of features and its corresponding cell are highlighted
  • Figure 3: The algorithm for constructing implicit ensembles on the base model
  • Figure 4: The plot shows the prediction accuracies as we move away from the last seen training case. If there were little to no climatical changes, we would expect no significant decay in accuracy, but plotting the line of best fit on the graph clearly shows a steady decrease in accuracy, which points towards Climate Change
  • Figure 5: We perform the analysis on the four different layers: NVDI ( a), LSTD ( b) CHLORA ( c) and AOD ( d). The figure shows the global map highlighted based on the first eigenvalue ( left), from which we select some patches for PCA ( yellow box in magenta circle). Note that high variance regions (indicating larger shifts) are mostly around highly populated areas and at the border between continent and ocean. We project the 12-dimensional data points (corresponding to the yellow patch on the left) on the first two PCA components and plot them in 2D describing the years 2005-2021 ( blue to red). Note, for example, how in plot ( a) the points move from left (years close to 2005) to right (years close to 2021), clearly indicating a climate distribution shift.
  • ...and 6 more figures