Table of Contents
Fetching ...

Dual Codebook VQ: Enhanced Image Reconstruction with Reduced Codebook Size

Parisa Boodaghi Malidarreh, Jillur Rahman Saurav, Thuong Le Hoai Pham, Amir Hajighasemi, Anahita Samadi, Saurabh Shrinivas Maydeo, Mohammad Sadegh Nasr, Jacob M. Luber

TL;DR

This paper tackles the codebook utilization bottleneck in vector-quantized image reconstruction by introducing Dual Codebook VQ, which partitions latent representations into a global, transformer-based codebook and a local, deterministic codebook. Both codebooks are updated from scratch, enabling high-fidelity reconstructions with a compact 512-entry codebook and without pretrained priors. Empirically, the approach achieves state-of-the-art or competitive results across ADE20K, MSCOCO, and CelebA-HQ datasets, including notable FID improvements (e.g., ADE20K: 17.03 vs 20.25; MSCOCO: 4.19 vs 9.82) while using far smaller codebooks than prior methods like VQCT. The method is computationally efficient, integrates with VQ-GAN frameworks, and demonstrates improved codebook utilization, offering a practical route to high-quality image reconstruction under resource constraints.

Abstract

Vector Quantization (VQ) techniques face significant challenges in codebook utilization, limiting reconstruction fidelity in image modeling. We introduce a Dual Codebook mechanism that effectively addresses this limitation by partitioning the representation into complementary global and local components. The global codebook employs a lightweight transformer for concurrent updates of all code vectors, while the local codebook maintains precise feature representation through deterministic selection. This complementary approach is trained from scratch without requiring pre-trained knowledge. Experimental evaluation across multiple standard benchmark datasets demonstrates state-of-the-art reconstruction quality while using a compact codebook of size 512 - half the size of previous methods that require pre-training. Our approach achieves significant FID improvements across diverse image domains, particularly excelling in scene and face reconstruction tasks. These results establish Dual Codebook VQ as an efficient paradigm for high-fidelity image reconstruction with significantly reduced computational requirements.

Dual Codebook VQ: Enhanced Image Reconstruction with Reduced Codebook Size

TL;DR

This paper tackles the codebook utilization bottleneck in vector-quantized image reconstruction by introducing Dual Codebook VQ, which partitions latent representations into a global, transformer-based codebook and a local, deterministic codebook. Both codebooks are updated from scratch, enabling high-fidelity reconstructions with a compact 512-entry codebook and without pretrained priors. Empirically, the approach achieves state-of-the-art or competitive results across ADE20K, MSCOCO, and CelebA-HQ datasets, including notable FID improvements (e.g., ADE20K: 17.03 vs 20.25; MSCOCO: 4.19 vs 9.82) while using far smaller codebooks than prior methods like VQCT. The method is computationally efficient, integrates with VQ-GAN frameworks, and demonstrates improved codebook utilization, offering a practical route to high-quality image reconstruction under resource constraints.

Abstract

Vector Quantization (VQ) techniques face significant challenges in codebook utilization, limiting reconstruction fidelity in image modeling. We introduce a Dual Codebook mechanism that effectively addresses this limitation by partitioning the representation into complementary global and local components. The global codebook employs a lightweight transformer for concurrent updates of all code vectors, while the local codebook maintains precise feature representation through deterministic selection. This complementary approach is trained from scratch without requiring pre-trained knowledge. Experimental evaluation across multiple standard benchmark datasets demonstrates state-of-the-art reconstruction quality while using a compact codebook of size 512 - half the size of previous methods that require pre-training. Our approach achieves significant FID improvements across diverse image domains, particularly excelling in scene and face reconstruction tasks. These results establish Dual Codebook VQ as an efficient paradigm for high-fidelity image reconstruction with significantly reduced computational requirements.

Paper Structure

This paper contains 22 sections, 4 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Comparison of reconstruction quality across different datasets. Our Dual Codebook method produces sharp details and preserves textures across different domain
  • Figure 2: Overview of our Dual Codebook framework, comprising an encoder, a dual codebook mechanism, a decoder, and a discriminator. The encoder processes an input image into spatially continuous vectors, which are then split into two halves. The first half is processed using a lightweight transformer called global codebook, while the second half is updated via a deterministic quantizer as a local codebook. After mapping each continuous vector to discrete code vectors, the two halves are concatenated and fed into the decoder to reconstruct the image using the quantized representations. A discriminator is employed to incorporate GAN-based training objectives.
  • Figure 3: Visualization of codebook (first row) and illustration of codebook usage (second row) on MS-COCO dataset for VQ-GAN (blue) and Dual Codebook VQ (red and blue). In this comparison, VQ-GAN was tested with codebook size of 1024, but our Dual Codebook was teste with 512 codebook size totally (the red graph related to global codebook usage and blue one is for local codebook usage for Dual codebook).
  • Figure 4: Comparison of reconstructed images from our Dual Codebook and VQ-GAN on two datasets. Two models are trained under the same settings and same compression ration ($768\times$, i.e., $256 \times 256 \times 3 \rightarrow 16 \times 16$) but different codebook size (codebook size for VQ-GAN is 1024 and for Dual Codebook is 512). Our proposed quantization method significantly improves reconstruction quality, particularly enhancing texture details in building facades and the sky background in the ADE20K dataset. In the second row, for MS-COCO, it not only refines the anatomical details of the man but also enhances the reconstruction accuracy and color fidelity of the man's helmet.(the red boxed highlighted the reconstruction details).
  • Figure 5: Best and worst reconstruction result on CelebA-HQ (test split) with Dual Codebook quantizer respectively first and second row (Best and worst case selected based on PSNR value).
  • ...and 4 more figures