Table of Contents
Fetching ...

MOMO: Mars Orbital Model Foundation Model for Mars Orbital Applications

Mirali Purohit, Bimal Gajera, Irish Mehta, Bhanu Tokas, Jacob Adler, Steven Lu, Scott Dickenshied, Serina Diniega, Brian Bue, Umaa Rebbapragada, Hannah Kerner

Abstract

We introduce MOMO, the first multi-sensor foundation model for Mars remote sensing. MOMO uses model merge to integrate representations learned independently from three key Martian sensors (HiRISE, CTX, and THEMIS), spanning resolutions from 0.25 m/pixel to 100 m/pixel. Central to our method is our novel Equal Validation Loss (EVL) strategy, which aligns checkpoints across sensors based on validation loss similarity before fusion via task arithmetic. This ensures models are merged at compatible convergence stages, leading to improved stability and generalization. We train MOMO on a large-scale, high-quality corpus of $\sim 12$ million samples curated from Mars orbital data and evaluate it on 9 downstream tasks from Mars-Bench. MOMO achieves better overall performance compared to ImageNet pre-trained, earth observation foundation model, sensor-specific pre-training, and fully-supervised baselines. Particularly on segmentation tasks, MOMO shows consistent and significant performance improvement. Our results demonstrate that model merging through an optimal checkpoint selection strategy provides an effective approach for building foundation models for multi-resolution data. The model weights, pretraining code, pretraining data, and evaluation code are available at: https://github.com/kerner-lab/MOMO.

MOMO: Mars Orbital Model Foundation Model for Mars Orbital Applications

Abstract

We introduce MOMO, the first multi-sensor foundation model for Mars remote sensing. MOMO uses model merge to integrate representations learned independently from three key Martian sensors (HiRISE, CTX, and THEMIS), spanning resolutions from 0.25 m/pixel to 100 m/pixel. Central to our method is our novel Equal Validation Loss (EVL) strategy, which aligns checkpoints across sensors based on validation loss similarity before fusion via task arithmetic. This ensures models are merged at compatible convergence stages, leading to improved stability and generalization. We train MOMO on a large-scale, high-quality corpus of million samples curated from Mars orbital data and evaluate it on 9 downstream tasks from Mars-Bench. MOMO achieves better overall performance compared to ImageNet pre-trained, earth observation foundation model, sensor-specific pre-training, and fully-supervised baselines. Particularly on segmentation tasks, MOMO shows consistent and significant performance improvement. Our results demonstrate that model merging through an optimal checkpoint selection strategy provides an effective approach for building foundation models for multi-resolution data. The model weights, pretraining code, pretraining data, and evaluation code are available at: https://github.com/kerner-lab/MOMO.

Paper Structure

This paper contains 56 sections, 8 equations, 20 figures, 7 tables.

Figures (20)

  • Figure 1: MOMO can be effectively applied across a wide range of resolutions and a broad spectrum of Martian remote sensing tasks. By leveraging diverse sensors, our approach enables a single model to generalize across different orbital applications, including large-scale crater or landslide mapping and precise boulder localization.
  • Figure 2: Illustrative samples of poor- and high-quality image samples from the HiRISE, CTX, and THEMIS sensors. The top row shows rejected low-quality samples exhibiting artifacts, blur, or noise, while the bottom row shows high-quality samples retained for pre-training.
  • Figure 3: Loss landscape visualization across different checkpoint selection strategies on DoMars16k and Landmark datasets. The red markers represent MOMO obtained using Early Stopping (ES), Last Epoch (LE), and Equal Validation Loss (EVL), respectively.
  • Figure 4: Example of a HiRISE map-projected image used in our study. The dark border around the image represents no-data regions that were filtered out during preprocessing to ensure high-quality crop selection.
  • Figure 5: HiRISE pre-training data distribution
  • ...and 15 more figures