Table of Contents
Fetching ...

Towards 1000-fold Electron Microscopy Image Compression for Connectomics via VQ-VAE with Transformer Prior

Fuming Yang, Yicong Li, Hanspeter Pfister, Jeff W. Lichtman, Yaron Meirovitch

TL;DR

The paper tackles the storage and analysis bottlenecks of petascale EM data by introducing a vector-quantized variational autoencoder (VQ-VAE) with a Transformer prior that enables pay-as-you-decode compression from $16\\times$ to $1024\\times$, while preserving neuronal structures. It presents a two-level VQ-VAE architecture with top and bottom token latents, FiLM-based fusion, and an ROI-driven workflow for selective high-resolution reconstruction, achieving competitive SSIM and robust downstream task performance across datasets. The work demonstrates near-parity with AVIF at moderate compression and strong maintenance of segmentation and synapse detection at extreme ratios, plus a practical mechanism to extract high-resolution regions on demand. This approach lays the groundwork for a foundation-model-like, token-based EM compression framework that can generalize across connectomic datasets and support scalable, on-demand analysis.

Abstract

Petascale electron microscopy (EM) datasets push storage, transfer, and downstream analysis toward their current limits. We present a vector-quantized variational autoencoder-based (VQ-VAE) compression framework for EM that spans 16x to 1024x and enables pay-as-you-decode usage: top-only decoding for extreme compression, with an optional Transformer prior that predicts bottom tokens (without changing the compression ratio) to restore texture via feature-wise linear modulation (FiLM) and concatenation; we further introduce an ROI-driven workflow that performs selective high-resolution reconstruction from 1024x-compressed latents only where needed.

Towards 1000-fold Electron Microscopy Image Compression for Connectomics via VQ-VAE with Transformer Prior

TL;DR

The paper tackles the storage and analysis bottlenecks of petascale EM data by introducing a vector-quantized variational autoencoder (VQ-VAE) with a Transformer prior that enables pay-as-you-decode compression from to , while preserving neuronal structures. It presents a two-level VQ-VAE architecture with top and bottom token latents, FiLM-based fusion, and an ROI-driven workflow for selective high-resolution reconstruction, achieving competitive SSIM and robust downstream task performance across datasets. The work demonstrates near-parity with AVIF at moderate compression and strong maintenance of segmentation and synapse detection at extreme ratios, plus a practical mechanism to extract high-resolution regions on demand. This approach lays the groundwork for a foundation-model-like, token-based EM compression framework that can generalize across connectomic datasets and support scalable, on-demand analysis.

Abstract

Petascale electron microscopy (EM) datasets push storage, transfer, and downstream analysis toward their current limits. We present a vector-quantized variational autoencoder-based (VQ-VAE) compression framework for EM that spans 16x to 1024x and enables pay-as-you-decode usage: top-only decoding for extreme compression, with an optional Transformer prior that predicts bottom tokens (without changing the compression ratio) to restore texture via feature-wise linear modulation (FiLM) and concatenation; we further introduce an ROI-driven workflow that performs selective high-resolution reconstruction from 1024x-compressed latents only where needed.

Paper Structure

This paper contains 7 sections, 2 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Left: Encoder and Decoder training. Right: Image reconstruction.
  • Figure 2: 2D segmentation comparison for AVIF-16x, and Ours-16x, 256x, and 1024x
  • Figure 3: SSIM comparison. (A) Ours vs. AVIF. (B) Extended ratios of ours.
  • Figure 4: Synapse prediction on the compressed EM
  • Figure 5: Selective high-resolution mitochondria from 1024$\times$ compressed EM.
  • ...and 1 more figures