Table of Contents
Fetching ...

Datacube segmentation via Deep Spectral Clustering

Alessandro Bombini, Fernando García-Avello Bofías, Caterina Bracci, Michele Ginolfi, Chiara Ruberto

TL;DR

The paper tackles the problem of segmenting high-dimensional spectral datacubes by performing unsupervised deep clustering in a learned latent space. It introduces a deep clustering framework built from an autoencoder that maps spectra to a latent representation and an iterative, learnable IKMeans for clustering, with a reconstruction objective and a clustering loss guiding the representation. It extends to a variational deep embedding using beta-VAE/InfoVAE with an MMD-based regularizer, enabling a generative, end-to-end trainable model whose epoch-dependent losses balance reconstruction, clustering, and latent regularization. The method is demonstrated on two synthetic use cases—astrophysical datacubes and MA-XRF cultural heritage datacubes—showing high-quality segmentation, meaningful cluster interpretation, and resilience to noise, with code and data made publicly available for reproducibility and further development.

Abstract

Extended Vision techniques are ubiquitous in physics. However, the data cubes steaming from such analysis often pose a challenge in their interpretation, due to the intrinsic difficulty in discerning the relevant information from the spectra composing the data cube. Furthermore, the huge dimensionality of data cube spectra poses a complex task in its statistical interpretation; nevertheless, this complexity contains a massive amount of statistical information that can be exploited in an unsupervised manner to outline some essential properties of the case study at hand, e.g.~it is possible to obtain an image segmentation via (deep) clustering of data-cube's spectra, performed in a suitably defined low-dimensional embedding space. To tackle this topic, we explore the possibility of applying unsupervised clustering methods in encoded space, i.e. perform deep clustering on the spectral properties of datacube pixels. A statistical dimensional reduction is performed by an ad hoc trained (Variational) AutoEncoder, in charge of mapping spectra into lower dimensional metric spaces, while the clustering process is performed by a (learnable) iterative K-Means clustering algorithm. We apply this technique to two different use cases, of different physical origins: a set of Macro mapping X-Ray Fluorescence (MA-XRF) synthetic data on pictorial artworks, and a dataset of simulated astrophysical observations.

Datacube segmentation via Deep Spectral Clustering

TL;DR

The paper tackles the problem of segmenting high-dimensional spectral datacubes by performing unsupervised deep clustering in a learned latent space. It introduces a deep clustering framework built from an autoencoder that maps spectra to a latent representation and an iterative, learnable IKMeans for clustering, with a reconstruction objective and a clustering loss guiding the representation. It extends to a variational deep embedding using beta-VAE/InfoVAE with an MMD-based regularizer, enabling a generative, end-to-end trainable model whose epoch-dependent losses balance reconstruction, clustering, and latent regularization. The method is demonstrated on two synthetic use cases—astrophysical datacubes and MA-XRF cultural heritage datacubes—showing high-quality segmentation, meaningful cluster interpretation, and resilience to noise, with code and data made publicly available for reproducibility and further development.

Abstract

Extended Vision techniques are ubiquitous in physics. However, the data cubes steaming from such analysis often pose a challenge in their interpretation, due to the intrinsic difficulty in discerning the relevant information from the spectra composing the data cube. Furthermore, the huge dimensionality of data cube spectra poses a complex task in its statistical interpretation; nevertheless, this complexity contains a massive amount of statistical information that can be exploited in an unsupervised manner to outline some essential properties of the case study at hand, e.g.~it is possible to obtain an image segmentation via (deep) clustering of data-cube's spectra, performed in a suitably defined low-dimensional embedding space. To tackle this topic, we explore the possibility of applying unsupervised clustering methods in encoded space, i.e. perform deep clustering on the spectral properties of datacube pixels. A statistical dimensional reduction is performed by an ad hoc trained (Variational) AutoEncoder, in charge of mapping spectra into lower dimensional metric spaces, while the clustering process is performed by a (learnable) iterative K-Means clustering algorithm. We apply this technique to two different use cases, of different physical origins: a set of Macro mapping X-Ray Fluorescence (MA-XRF) synthetic data on pictorial artworks, and a dataset of simulated astrophysical observations.
Paper Structure (21 sections, 13 equations, 20 figures, 1 table, 1 algorithm)

This paper contains 21 sections, 13 equations, 20 figures, 1 table, 1 algorithm.

Figures (20)

  • Figure 1: Graphical representation of an Autoencoder architecture.
  • Figure 2: A visual representation of the Deep Clustering architecture.
  • Figure 3: RGB seed image for the synthetic test datacube.
  • Figure 4: Clustered datacube using the trained model. On the left, is the binary map of the cluster pixels. In the middle, the Reconstructed Signal averaged over the cluster; on the right, the "true" signal averaged over the cluster.
  • Figure 5: Energy integrated spectral data-cubes.
  • ...and 15 more figures