Table of Contents
Fetching ...

Image Compression Using Novel View Synthesis Priors

Luyuan Peng, Mandar Chitre, Hari Vishnu, Yuen Min Too, Bharath Kalyan, Rajat Mishra, Soo Pieng Tan

TL;DR

This work proposes a model-based image compression technique that leverages prior mission information, employs trained machine-learning based novel view synthesis models, and uses gradient descent optimization to refine latent representations to help generate compressible differences between camera images and rendered images.

Abstract

Real-time visual feedback is essential for tetherless control of remotely operated vehicles, particularly during inspection and manipulation tasks. Though acoustic communication is the preferred choice for medium-range communication underwater, its limited bandwidth renders it impractical to transmit images or videos in real-time. To address this, we propose a model-based image compression technique that leverages prior mission information. Our approach employs trained machine-learning based novel view synthesis models, and uses gradient descent optimization to refine latent representations to help generate compressible differences between camera images and rendered images. We evaluate the proposed compression technique using a dataset from an artificial ocean basin, demonstrating superior compression ratios and image quality over existing techniques. Moreover, our method exhibits robustness to introduction of new objects within the scene, highlighting its potential for advancing tetherless remotely operated vehicle operations.

Image Compression Using Novel View Synthesis Priors

TL;DR

This work proposes a model-based image compression technique that leverages prior mission information, employs trained machine-learning based novel view synthesis models, and uses gradient descent optimization to refine latent representations to help generate compressible differences between camera images and rendered images.

Abstract

Real-time visual feedback is essential for tetherless control of remotely operated vehicles, particularly during inspection and manipulation tasks. Though acoustic communication is the preferred choice for medium-range communication underwater, its limited bandwidth renders it impractical to transmit images or videos in real-time. To address this, we propose a model-based image compression technique that leverages prior mission information. Our approach employs trained machine-learning based novel view synthesis models, and uses gradient descent optimization to refine latent representations to help generate compressible differences between camera images and rendered images. We evaluate the proposed compression technique using a dataset from an artificial ocean basin, demonstrating superior compression ratios and image quality over existing techniques. Moreover, our method exhibits robustness to introduction of new objects within the scene, highlighting its potential for advancing tetherless remotely operated vehicle operations.

Paper Structure

This paper contains 18 sections, 3 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Effect of a 5° rotation error on the rendered image. The left image is the camera image, the middle image is the image rendered at the latent representation rotated by 5° about the x-axis, and the right image is the difference between the two images.
  • Figure 2: Classic image compression scheme and NVS-based image compression scheme.
  • Figure 3: Flow of iNVS.
  • Figure 4: Example images from the dataset: (a) An image from the mapping run (M1), showing the ROV surveying the original structure; (b) An image from test run 1 (T1), where the ROV continues to survey the same structure for performance evaluation; (c) An image from test run 2 (T2), featuring an additional metallic structure placed next to the original to test the robustness of our technique toward novel objects in the scene.
  • Figure 5: Performance of MSE loss and Matching Loss at different perturbation. Panel (a) compares the performance at different rotational perturbation in initial latent representation. Panel (b) compares the performance at different translational perturbation.
  • ...and 4 more figures