Table of Contents
Fetching ...

Remote Inference over Dynamic Links via Adaptive Rate Deep Task-Oriented Vector Quantization

Eyal Fishel, May Malka, Shai Ginzach, Nir Shlezinger

TL;DR

Remote inference over dynamic, rate-limited channels is challenging for static learned compression. ARTOVeQ introduces a rate-adaptive, task-oriented vector quantizer built on a single nested codebook that supports multi-rate and progressive decoding via progressive learning, mixed-resolution, and nested quantization. A three-stage training procedure initializes, bootstraps, and adapts the codebook, enabling a single model to operate across a broad range of bit budgets while maintaining high accuracy. Experiments on CIFAR-100 and Imagewoof with MobileNetV2 show ARTOVeQ closely approaches the performance of separate fixed-rate models, while offering rapid initial inference and graceful improvement as more bits arrive, making remote inference practical over dynamic networks.

Abstract

A broad range of technologies rely on remote inference, wherein data acquired is conveyed over a communication channel for inference in a remote server. Communication between the participating entities is often carried out over rate-limited channels, necessitating data compression for reducing latency. While deep learning facilitates joint design of the compression mapping along with encoding and inference rules, existing learned compression mechanisms are static, and struggle in adapting their resolution to changes in channel conditions and to dynamic links. To address this, we propose Adaptive Rate Task-Oriented Vector Quantization (ARTOVeQ), a learned compression mechanism that is tailored for remote inference over dynamic links. ARTOVeQ is based on designing nested codebooks along with a learning algorithm employing progressive learning. We show that ARTOVeQ extends to support low-latency inference that is gradually refined via successive refinement principles, and that it enables the simultaneous usage of multiple resolutions when conveying high-dimensional data. Numerical results demonstrate that the proposed scheme yields remote deep inference that operates with multiple rates, supports a broad range of bit budgets, and facilitates rapid inference that gradually improves with more bits exchanged, while approaching the performance of single-rate deep quantization methods.

Remote Inference over Dynamic Links via Adaptive Rate Deep Task-Oriented Vector Quantization

TL;DR

Remote inference over dynamic, rate-limited channels is challenging for static learned compression. ARTOVeQ introduces a rate-adaptive, task-oriented vector quantizer built on a single nested codebook that supports multi-rate and progressive decoding via progressive learning, mixed-resolution, and nested quantization. A three-stage training procedure initializes, bootstraps, and adapts the codebook, enabling a single model to operate across a broad range of bit budgets while maintaining high accuracy. Experiments on CIFAR-100 and Imagewoof with MobileNetV2 show ARTOVeQ closely approaches the performance of separate fixed-rate models, while offering rapid initial inference and graceful improvement as more bits arrive, making remote inference practical over dynamic networks.

Abstract

A broad range of technologies rely on remote inference, wherein data acquired is conveyed over a communication channel for inference in a remote server. Communication between the participating entities is often carried out over rate-limited channels, necessitating data compression for reducing latency. While deep learning facilitates joint design of the compression mapping along with encoding and inference rules, existing learned compression mechanisms are static, and struggle in adapting their resolution to changes in channel conditions and to dynamic links. To address this, we propose Adaptive Rate Task-Oriented Vector Quantization (ARTOVeQ), a learned compression mechanism that is tailored for remote inference over dynamic links. ARTOVeQ is based on designing nested codebooks along with a learning algorithm employing progressive learning. We show that ARTOVeQ extends to support low-latency inference that is gradually refined via successive refinement principles, and that it enables the simultaneous usage of multiple resolutions when conveying high-dimensional data. Numerical results demonstrate that the proposed scheme yields remote deep inference that operates with multiple rates, supports a broad range of bit budgets, and facilitates rapid inference that gradually improves with more bits exchanged, while approaching the performance of single-rate deep quantization methods.
Paper Structure (24 sections, 10 equations, 11 figures, 1 table, 2 algorithms)

This paper contains 24 sections, 10 equations, 11 figures, 1 table, 2 algorithms.

Figures (11)

  • Figure 1: Remote inference system illustration
  • Figure 2: VQ-VAE architecture. The encoder maps the input ${\boldsymbol{x}}$ into the features ${\boldsymbol{x}}^{\rm e}$, which is divided into $M$ sub-vectors of size $d\times 1$. Each sub-vector undergoes the vector quantization mechanism, which selects an embedding based the distance from the codebook vectors. The decoder is applied to the collection of quantized sub-vectors for inference.
  • Figure 3: ARTOVeQ training illustration
  • Figure 4: Mixed resolution artoveq illustration. Different colors represent different quantization resolutions.
  • Figure 5: Learned codebook vectors. Embedding dimensions $d=2$
  • ...and 6 more figures