Table of Contents
Fetching ...

Generalized Nested Latent Variable Models for Lossy Coding applied to Wind Turbine Scenarios

Raül Pérez-Gonzalo, Andreas Espersen, Antonio Agudo

TL;DR

Addresses the challenge of balancing rate and distortion in learned image compression using $L$-level nested latent models. The method introduces a generalized relaxed rate-distortion loss and a Markov-chain latent structure to capture dependencies, with a logistic prior and common latent dimension to control complexity. Key findings show that beyond $L=2$ a trainable prior becomes detrimental, while carefully chosen $L$ achieves state-of-the-art performance with lower computational cost, capable of approximating autoregressive coders. The approach is validated on wind turbine blade imagery, demonstrating practical applicability for automated blade inspections and visual quality assurance.

Abstract

Rate-distortion optimization through neural networks has accomplished competitive results in compression efficiency and image quality. This learning-based approach seeks to minimize the compromise between compression rate and reconstructed image quality by automatically extracting and retaining crucial information, while discarding less critical details. A successful technique consists in introducing a deep hyperprior that operates within a 2-level nested latent variable model, enhancing compression by capturing complex data dependencies. This paper extends this concept by designing a generalized L-level nested generative model with a Markov chain structure. We demonstrate as L increases that a trainable prior is detrimental and explore a common dimensionality along the distinct latent variables to boost compression performance. As this structured framework can represent autoregressive coders, we outperform the hyperprior model and achieve state-of-the-art performance while reducing substantially the computational cost. Our experimental evaluation is performed on wind turbine scenarios to study its application on visual inspections

Generalized Nested Latent Variable Models for Lossy Coding applied to Wind Turbine Scenarios

TL;DR

Addresses the challenge of balancing rate and distortion in learned image compression using -level nested latent models. The method introduces a generalized relaxed rate-distortion loss and a Markov-chain latent structure to capture dependencies, with a logistic prior and common latent dimension to control complexity. Key findings show that beyond a trainable prior becomes detrimental, while carefully chosen achieves state-of-the-art performance with lower computational cost, capable of approximating autoregressive coders. The approach is validated on wind turbine blade imagery, demonstrating practical applicability for automated blade inspections and visual quality assurance.

Abstract

Rate-distortion optimization through neural networks has accomplished competitive results in compression efficiency and image quality. This learning-based approach seeks to minimize the compromise between compression rate and reconstructed image quality by automatically extracting and retaining crucial information, while discarding less critical details. A successful technique consists in introducing a deep hyperprior that operates within a 2-level nested latent variable model, enhancing compression by capturing complex data dependencies. This paper extends this concept by designing a generalized L-level nested generative model with a Markov chain structure. We demonstrate as L increases that a trainable prior is detrimental and explore a common dimensionality along the distinct latent variables to boost compression performance. As this structured framework can represent autoregressive coders, we outperform the hyperprior model and achieve state-of-the-art performance while reducing substantially the computational cost. Our experimental evaluation is performed on wind turbine scenarios to study its application on visual inspections
Paper Structure (14 sections, 6 equations, 9 figures, 1 table)

This paper contains 14 sections, 6 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Generalized nested latent variable model for lossy compression. Solid arrows signify direct calculations of the latent variables $\mathbf{z}_l$ from the encoder and the input image $\mathbf{x}$ from the decoder, while dashed arrows entail the estimation of likelihood and prior distributions.
  • Figure 2: Proposed architecture for a generalized nested latent variable model. In the first layer, the decoder reconstructs directly the input image $\mathbf{x}$, while it estimates the likelihood distributions in the rest of the layers. The network is built with building blocks that are composed of a convolution, a down/upsampling operation and a nonlinear function.
  • Figure 3: Four instances of wind turbine blade images. The pictures showcase distinct blade surfaces and locations with respect to the rotor of the turbine.
  • Figure 4: $L=3$
  • Figure 5: $L=4$
  • ...and 4 more figures