PCA-VAE: Differentiable Subspace Quantization without Codebook Collapse
Hao Lu, Onur C. Koyun, Yongxin Guo, Zhengjie Zhu, Abbas Alili, Metin Nafi Gurcan
TL;DR
This work tackles the non-differentiability and collapse risks of vector quantization in deep generative models by replacing VQ with an online PCA layer learned via Oja’s rule. The PCA-VAE framework replaces the discrete codebook with a differentiable, orthogonal latent projection $\hat{\mathbf{h}} = C C^{\top} (\mathbf{h}-\boldsymbol{\mu}) + \boldsymbol{\mu}$, with PCA parameters updated outside the standard backpropagation. Experiments on CelebA-HQ show PCA-VAE achieves reconstruction quality competitive with or surpassing VQ-based methods while using 10×–100× fewer latent bits, and reveals interpretable, variance-ordered latent axes (e.g., illumination, pose, gender cues). The results suggest PCA as a principled alternative to vector quantization, offering stability, bit-efficiency, and semantic structure with broad applicability beyond discrete tokenizers.
Abstract
Vector-quantized autoencoders deliver high-fidelity latents but suffer inherent flaws: the quantizer is non-differentiable, requires straight-through hacks, and is prone to collapse. We address these issues at the root by replacing VQ with a simple, principled, and fully differentiable alternative: an online PCA bottleneck trained via Oja's rule. The resulting model, PCA-VAE, learns an orthogonal, variance-ordered latent basis without codebooks, commitment losses, or lookup noise. Despite its simplicity, PCA-VAE exceeds VQ-GAN and SimVQ in reconstruction quality on CelebAHQ while using 10-100x fewer latent bits. It also produces naturally interpretable dimensions (e.g., pose, lighting, gender cues) without adversarial regularization or disentanglement objectives. These results suggest that PCA is a viable replacement for VQ: mathematically grounded, stable, bit-efficient, and semantically structured, offering a new direction for generative models beyond vector quantization.
