Table of Contents
Fetching ...

Cryo-SWAN: the Multi-Scale Wavelet-decomposition-inspired Autoencoder Network for molecular density representation of molecular volumes

Rui Li, Artsemi Yushkevich, Mikhail Kudryashev, Artur Yakimovich

TL;DR

Cryo-SWAN is a voxel-based variational autoencoder inspired by multi-scale wavelet decomposition that performs conditional coarse-to-fine latent encoding and recursive residual quantization across perception scales, enabling accurate capture of both global geometry and high-frequency structural detail in molecular density volumes.

Abstract

Learning robust representations of 3D shapes from voxelized data is essential for advancing AI methods in biomedical imaging. However, most contemporary 3D computer vision approaches operate on point clouds, meshes, or octrees, while volumetric density maps, the native format of structural biology and cryo-EM, remain comparatively underexplored. We present Cryo-SWAN, a voxel-based variational autoencoder inspired by multi-scale wavelet decomposition. The model performs conditional coarse-to-fine latent encoding and recursive residual quantization across perception scales, enabling accurate capture of both global geometry and high-frequency structural detail in molecular density volumes. Evaluated on ModelNet40, BuildingNet, and a newly curated dataset of cryo-EM volumes, ProteinNet3D, Cryo-SWAN consistently improves reconstruction quality over state-of-the-art 3D autoencoders. We demonstrate that the molecular densities organize in learned latent space according to shared geometric features, while integration with diffusion models enables denoising and conditional shape generation. Together, Cryo-SWAN is a practical framework for data-driven structural biology and volumetric imaging.

Cryo-SWAN: the Multi-Scale Wavelet-decomposition-inspired Autoencoder Network for molecular density representation of molecular volumes

TL;DR

Cryo-SWAN is a voxel-based variational autoencoder inspired by multi-scale wavelet decomposition that performs conditional coarse-to-fine latent encoding and recursive residual quantization across perception scales, enabling accurate capture of both global geometry and high-frequency structural detail in molecular density volumes.

Abstract

Learning robust representations of 3D shapes from voxelized data is essential for advancing AI methods in biomedical imaging. However, most contemporary 3D computer vision approaches operate on point clouds, meshes, or octrees, while volumetric density maps, the native format of structural biology and cryo-EM, remain comparatively underexplored. We present Cryo-SWAN, a voxel-based variational autoencoder inspired by multi-scale wavelet decomposition. The model performs conditional coarse-to-fine latent encoding and recursive residual quantization across perception scales, enabling accurate capture of both global geometry and high-frequency structural detail in molecular density volumes. Evaluated on ModelNet40, BuildingNet, and a newly curated dataset of cryo-EM volumes, ProteinNet3D, Cryo-SWAN consistently improves reconstruction quality over state-of-the-art 3D autoencoders. We demonstrate that the molecular densities organize in learned latent space according to shared geometric features, while integration with diffusion models enables denoising and conditional shape generation. Together, Cryo-SWAN is a practical framework for data-driven structural biology and volumetric imaging.
Paper Structure (21 sections, 11 equations, 10 figures, 1 table, 2 algorithms)

This paper contains 21 sections, 11 equations, 10 figures, 1 table, 2 algorithms.

Figures (10)

  • Figure 1: Principle architecture of Multi-Scale Wavelet-decomposition-inspired Autoencoder Network. (a) Multi-scale decomposition of the input signal. (b) The residual quantization process. A residual loss is computed during codebook lookup at each scale. (c) Downstream applications of the voxel-based 3D shape representations.
  • Figure 2: cryo-SWAN performance on common 3D shape benchmarks. (a) Performance on three representative objects from BuildingNet dataset. (b) Performance on three representative objects from ModelNet dataset. Here, low, mix, and high denote the frequency components, where mix captures a combination of low- and high-frequency details.
  • Figure 3: Representation learning on the ProteinNet3D and explorations in the latent space. (a) The representation learning on the ProteinNet3D. (b) The FSC evaluation of all the representations. (c) Dimensionality reduction comparison: Cryo-SWAN latent vectors vs. original EMDB volumes. (d) UMAP projection of latent vectors reveals distinct hubs reflecting structural similarity. Molecules with similar 3D structures at 8 Å resolution (siblings).
  • Figure 4: Ablation study for codebook collapsing and multi-scale representations. (a) The calls for all the codebook entries. (b) The latent code representation for both scales (coarse and fine). (c) The molecular density volume: original (GT), at a certain representation scale (RQ1) and at a full representation (RQ1+RQ2). (d) Evaluation of representation performance at a certain representation scale (RQ1) compared to the full representation (RQ1+RQ2).
  • Figure 5: Downstream applications of cryo-SWAN including shape denoising and conditional molecular shape generation. (a) The conditional generation from diffusion models based on cryo-SWAN representations. (b) Examples of conditional generations. The variants are similar in the geometric perspective to the real protein densities in EMDB (anchor point). (c) The unsupervised denoising based on the representation learning for 3D densities from ModelNet.
  • ...and 5 more figures