Table of Contents
Fetching ...

Nested Fusion: A Method for Learning High Resolution Latent Structure of Multi-Scale Measurement Data on Mars

Austin P. Wright, Scott Davidoff, Duen Horng Chau

TL;DR

Nested Fusion tackles the challenge of learning high-resolution latent structure from nested, multi-scale measurements by formalizing nested datasets and employing a variational autoencoder that encodes hierarchical data into a max-resolution latent space. The method tokenizes heterogeneous data scales into a sequence processed by a dedicated encoder, with per-scale decoders that preserve cross-scale dependencies via a nested beta correspondence. Empirically, Nested Fusion outperforms joint and concatenative baselines in both qualitative latent structure and quantitative reconstruction fidelity on Mars PIXL data, and it has been deployed in NASA's PIXL scientific workflow with open-source tooling. The approach enables rapid, interpretable exploration of cross-modal patterns, significantly accelerating mineral identification workflows and informing future data-analysis designs for multi-scale planetary science datasets.

Abstract

The Mars Perseverance Rover represents a generational change in the scale of measurements that can be taken on Mars, however this increased resolution introduces new challenges for techniques in exploratory data analysis. The multiple different instruments on the rover each measures specific properties of interest to scientists, so analyzing how underlying phenomena affect multiple different instruments together is important to understand the full picture. However each instrument has a unique resolution, making the mapping between overlapping layers of data non-trivial. In this work, we introduce Nested Fusion, a method to combine arbitrarily layered datasets of different resolutions and produce a latent distribution at the highest possible resolution, encoding complex interrelationships between different measurements and scales. Our method is efficient for large datasets, can perform inference even on unseen data, and outperforms existing methods of dimensionality reduction and latent analysis on real-world Mars rover data. We have deployed our method Nested Fusion within a Mars science team at NASA Jet Propulsion Laboratory (JPL) and through multiple rounds of participatory design enabled greatly enhanced exploratory analysis workflows for real scientists. To ensure the reproducibility of our work we have open sourced our code on GitHub at https://github.com/pixlise/NestedFusion.

Nested Fusion: A Method for Learning High Resolution Latent Structure of Multi-Scale Measurement Data on Mars

TL;DR

Nested Fusion tackles the challenge of learning high-resolution latent structure from nested, multi-scale measurements by formalizing nested datasets and employing a variational autoencoder that encodes hierarchical data into a max-resolution latent space. The method tokenizes heterogeneous data scales into a sequence processed by a dedicated encoder, with per-scale decoders that preserve cross-scale dependencies via a nested beta correspondence. Empirically, Nested Fusion outperforms joint and concatenative baselines in both qualitative latent structure and quantitative reconstruction fidelity on Mars PIXL data, and it has been deployed in NASA's PIXL scientific workflow with open-source tooling. The approach enables rapid, interpretable exploration of cross-modal patterns, significantly accelerating mineral identification workflows and informing future data-analysis designs for multi-scale planetary science datasets.

Abstract

The Mars Perseverance Rover represents a generational change in the scale of measurements that can be taken on Mars, however this increased resolution introduces new challenges for techniques in exploratory data analysis. The multiple different instruments on the rover each measures specific properties of interest to scientists, so analyzing how underlying phenomena affect multiple different instruments together is important to understand the full picture. However each instrument has a unique resolution, making the mapping between overlapping layers of data non-trivial. In this work, we introduce Nested Fusion, a method to combine arbitrarily layered datasets of different resolutions and produce a latent distribution at the highest possible resolution, encoding complex interrelationships between different measurements and scales. Our method is efficient for large datasets, can perform inference even on unseen data, and outperforms existing methods of dimensionality reduction and latent analysis on real-world Mars rover data. We have deployed our method Nested Fusion within a Mars science team at NASA Jet Propulsion Laboratory (JPL) and through multiple rounds of participatory design enabled greatly enhanced exploratory analysis workflows for real scientists. To ensure the reproducibility of our work we have open sourced our code on GitHub at https://github.com/pixlise/NestedFusion.
Paper Structure (15 sections, 6 equations, 5 figures, 2 tables)

This paper contains 15 sections, 6 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Model architecture and data processing pipeline for Nested Fusion as applied to PIXL data. High resolution latent vectors are encoded given a scan point containing an XRF quantification vector and collection of MCC imaging pixels.
  • Figure 2: Plate Notation for Graphical Models representing different latent variable formulations for the PIXL MCC nested measurement dataset. From left to right we have: (Left) Nested Fusion, representing the latent corresponding to the maximum resolution datascale and informing higher level measurements through aggregated functions; (Center) the concatenative model where there is a latent at the maximum resolution scale which affects higher level corresponding measurements not in aggregate but independently; and (Right) the joint model where a latent exists at low resolution and determines the whole distribution of all high resolution measurements.
  • Figure 3: Comparison between alternate models and their relative downsides. The left column shows the dependence mappings from the learned latent spaces to the two measurement spaces for Nested Fusion. The center column shows how a joint encoding learns a lower resolution representation which overloads the decoder for high resolution imaging data. The right column shows how a concatenative model ignores to full spatial context of the low resolution measurements by only forming a mapping from a single high resolution point.
  • Figure 4: Comparison of 2D Latent Distributions from different methods applied to Dourbes target (RGB map of MCC Image shown in top right). Axes are unitless latent values. High resolution models (left column: Nested Fusion and concatenative models ) displayed with 300 bins across each axis, while low resolution joint models (right column) has 200 bins due to the differing number of samples in each model type.
  • Figure 5: Comparison of Nested Fusion and Concatenative UMAP wit latent dimension 2 in differentiating distinct minerals in the Dourbes target. In green is shown a region of the target identified as Pyroxene while in red is a region identified as Olivine based on existing analysistice2022alteration. Comparing the latent sub-distributions of these two samples, Nested Fusion produces a distribution which has a greater degree of separation between the different minerals.