Towards 1000-fold Electron Microscopy Image Compression for Connectomics via VQ-VAE with Transformer Prior
Fuming Yang, Yicong Li, Hanspeter Pfister, Jeff W. Lichtman, Yaron Meirovitch
TL;DR
The paper tackles the storage and analysis bottlenecks of petascale EM data by introducing a vector-quantized variational autoencoder (VQ-VAE) with a Transformer prior that enables pay-as-you-decode compression from $16\\times$ to $1024\\times$, while preserving neuronal structures. It presents a two-level VQ-VAE architecture with top and bottom token latents, FiLM-based fusion, and an ROI-driven workflow for selective high-resolution reconstruction, achieving competitive SSIM and robust downstream task performance across datasets. The work demonstrates near-parity with AVIF at moderate compression and strong maintenance of segmentation and synapse detection at extreme ratios, plus a practical mechanism to extract high-resolution regions on demand. This approach lays the groundwork for a foundation-model-like, token-based EM compression framework that can generalize across connectomic datasets and support scalable, on-demand analysis.
Abstract
Petascale electron microscopy (EM) datasets push storage, transfer, and downstream analysis toward their current limits. We present a vector-quantized variational autoencoder-based (VQ-VAE) compression framework for EM that spans 16x to 1024x and enables pay-as-you-decode usage: top-only decoding for extreme compression, with an optional Transformer prior that predicts bottom tokens (without changing the compression ratio) to restore texture via feature-wise linear modulation (FiLM) and concatenation; we further introduce an ROI-driven workflow that performs selective high-resolution reconstruction from 1024x-compressed latents only where needed.
