Table of Contents
Fetching ...

Affinity-VAE: incorporating prior knowledge in representation learning from scientific images

Marjan Famili, Jola Mirecka, Camila Rangel Smith, Anna Kotańska, Nikolai Juraschko, Beatriz Costa-Gomes, Colin M. Palmer, Jeyan Thiyagalingam, Tom Burnley, Mark Basham, Alan R. Lowe

TL;DR

Affinity-VAE extends beta-VAE with a prior-informed latent regularization that uses a precomputed affinity matrix to align latent similarity with known shape relationships in scientific images. The model disentangles shape from pose using a dedicated pose channel and supports two decoders—CNN and Gaussian mixture—to produce interpretable latent representations and approximate densities, enabling both classification and shape/pose inference in cryo-ET tomograms. Evaluations on alphanumeric data and SHREC 2021 simulated tomograms show that the latent space organizes by shape similarity, unseen molecules cluster near related training classes, and pose can be inferred, with competitive classification performance and interpretable density reconstructions. The approach holds promise for faster, more principled subtomogram analysis in cryo-ET, though it is limited by dataset size and the quality of the affinity metrics, motivating future work on larger datasets, physics-aware forward models, and NeRF-based reconstructions.

Abstract

Learning compact and interpretable representations of data is a critical challenge in scientific image analysis. Here, we introduce Affinity-VAE, a generative model that enables us to impose our scientific intuition about the similarity of instances in the dataset on the learned representation during training. We demonstrate the utility of the approach in the scientific domain of cryo-electron tomography (cryo-ET) where a significant current challenge is to identify similar molecules within a noisy and low contrast tomographic image volume. This task is distinct from classification in that, at inference time, it is unknown whether an instance is part of the training set or not. We trained affinity-VAE using prior knowledge of protein structure to inform the latent space. Our model is able to create rotationally-invariant, morphologically homogeneous clusters in the latent representation, with improved cluster separation compared to other approaches. It achieves competitive performance on protein classification with the added benefit of disentangling object pose, structural similarity and an interpretable latent representation. In the context of cryo-ET data, affinity-VAE captures the orientation of identified proteins in 3D which can be used as a prior for subsequent scientific experiments. Extracting physical principles from a trained network is of significant importance in scientific imaging where a ground truth training set is not always feasible.

Affinity-VAE: incorporating prior knowledge in representation learning from scientific images

TL;DR

Affinity-VAE extends beta-VAE with a prior-informed latent regularization that uses a precomputed affinity matrix to align latent similarity with known shape relationships in scientific images. The model disentangles shape from pose using a dedicated pose channel and supports two decoders—CNN and Gaussian mixture—to produce interpretable latent representations and approximate densities, enabling both classification and shape/pose inference in cryo-ET tomograms. Evaluations on alphanumeric data and SHREC 2021 simulated tomograms show that the latent space organizes by shape similarity, unseen molecules cluster near related training classes, and pose can be inferred, with competitive classification performance and interpretable density reconstructions. The approach holds promise for faster, more principled subtomogram analysis in cryo-ET, though it is limited by dataset size and the quality of the affinity metrics, motivating future work on larger datasets, physics-aware forward models, and NeRF-based reconstructions.

Abstract

Learning compact and interpretable representations of data is a critical challenge in scientific image analysis. Here, we introduce Affinity-VAE, a generative model that enables us to impose our scientific intuition about the similarity of instances in the dataset on the learned representation during training. We demonstrate the utility of the approach in the scientific domain of cryo-electron tomography (cryo-ET) where a significant current challenge is to identify similar molecules within a noisy and low contrast tomographic image volume. This task is distinct from classification in that, at inference time, it is unknown whether an instance is part of the training set or not. We trained affinity-VAE using prior knowledge of protein structure to inform the latent space. Our model is able to create rotationally-invariant, morphologically homogeneous clusters in the latent representation, with improved cluster separation compared to other approaches. It achieves competitive performance on protein classification with the added benefit of disentangling object pose, structural similarity and an interpretable latent representation. In the context of cryo-ET data, affinity-VAE captures the orientation of identified proteins in 3D which can be used as a prior for subsequent scientific experiments. Extracting physical principles from a trained network is of significant importance in scientific imaging where a ground truth training set is not always feasible.
Paper Structure (20 sections, 4 equations, 7 figures, 1 table)

This paper contains 20 sections, 4 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Architecture of Affinity-VAE. The mini-batch of training examples (which can be either 2D or 3D images) is encoded by the image encoder as vectors of latent representations of the underlying shape and corresponding vectors representing the intra-class variance arising, for example, from differences in rotational pose ($\psi$). The concatenated latent representations ($\mathbf{z}$) and poses ($\psi$) are used by an image decoder to reconstruct the outputs. During training, a pre-computed affinity matrix, representing the shape similarity score between the known object classes ($Y$) in the training dataset ($X$) is used to adjust the latent encodings ($z_1, \ldots,~ z_N$) to better represent our prior knowledge of their shapes. At inference time, Affinity-VAE uses the latent representation to encode and classify unseen objects based on their affinity with the training classes. Four character codes refer to unique PDB accession codes.
  • Figure 2: The performance of CNN-aVAE using a simple alphanumeric dataset with rotation as the source of intra-class variation. (a) UMAP embedding of latent vectors of the validation set built from the prediction for 200 samples, with random rotations, of seen (a, b, d, e, i, j, z, 2 and u) and unseen data (v). (b) Interpolation of the pose channel for an average point in the latent space (pink) and the interpolation of the pose channel conditioned on individual classes. (c) The correlation of the inferred 1D pose with the angle of rotation ($\theta$) of the input image. (d) The confusion matrix for classification of the validation set.
  • Figure 3: Learning a representation of protein shapes. (a) The Affinity matrix calculated using SOAP descriptors. (b) UMAP embedding of latent vectors from CNN-aVAE where 3D2F and 3H84 are unseen during training and only used for evaluation. The placement of these proteins in the latent space is as expected from their shape similarity to the training classes. (c) Interpolation within the 1D pose channel for four example proteins.
  • Figure 4: Decoded images interpolating within the pose channel for the embedding of 2CG9 using the Gaussian mixture decoder.
  • Figure 5: Linear interpolation between the latent encodings of 1D8Q (left) and 4CR2 (right) using a Gaussian mixture decoder.
  • ...and 2 more figures