Table of Contents
Fetching ...

DTR: A Unified Deep Tensor Representation Framework for Multimedia Data Recovery

Ting-Wei Zhou, Xi-Le Zhao, Jian-Li Wang, Yi-Si Luo, Min Wang, Xiao-Xuan Bai, Hong Yan

TL;DR

This paper introduces DTR, a unified deep tensor representation that couples a deep latent generative module $g_\theta(\cdot)$ with a deep transform module $f_\xi(\cdot)$ to advance multimedia data recovery. By modeling both intra- and inter-slice relations through an untrained neural generator and capturing frontal-slice dependencies via an untrained transform, DTR surpasses traditional shallow characterizations and transform-based methods. The authors formulate an unsupervised recovery objective and validate performance across HSIs, MSIs, and videos, showing significant improvements in PSNR/SSIM and visual detail preservation. The work provides practical insights into network choices, layer depths, and latent/tensor sizes, highlighting the practical value of combining deep latent generation with nonlinear transform in tensor-based recovery tasks.

Abstract

Recently, the transform-based tensor representation has attracted increasing attention in multimedia data (e.g., images and videos) recovery problems, which consists of two indispensable components, i.e., transform and characterization. Previously, the development of transform-based tensor representation mainly focuses on the transform aspect. Although several attempts consider using shallow matrix factorization (e.g., singular value decomposition and negative matrix factorization) to characterize the frontal slices of transformed tensor (termed as latent tensor), the faithful characterization aspect is underexplored. To address this issue, we propose a unified Deep Tensor Representation (termed as DTR) framework by synergistically combining the deep latent generative module and the deep transform module. Especially, the deep latent generative module can faithfully generate the latent tensor as compared with shallow matrix factorization. The new DTR framework not only allows us to better understand the classic shallow representations, but also leads us to explore new representation. To examine the representation ability of the proposed DTR, we consider the representative multi-dimensional data recovery task and suggest an unsupervised DTR-based multi-dimensional data recovery model. Extensive experiments demonstrate that DTR achieves superior performance compared to state-of-the-art methods in both quantitative and qualitative aspects, especially for fine details recovery.

DTR: A Unified Deep Tensor Representation Framework for Multimedia Data Recovery

TL;DR

This paper introduces DTR, a unified deep tensor representation that couples a deep latent generative module with a deep transform module to advance multimedia data recovery. By modeling both intra- and inter-slice relations through an untrained neural generator and capturing frontal-slice dependencies via an untrained transform, DTR surpasses traditional shallow characterizations and transform-based methods. The authors formulate an unsupervised recovery objective and validate performance across HSIs, MSIs, and videos, showing significant improvements in PSNR/SSIM and visual detail preservation. The work provides practical insights into network choices, layer depths, and latent/tensor sizes, highlighting the practical value of combining deep latent generation with nonlinear transform in tensor-based recovery tasks.

Abstract

Recently, the transform-based tensor representation has attracted increasing attention in multimedia data (e.g., images and videos) recovery problems, which consists of two indispensable components, i.e., transform and characterization. Previously, the development of transform-based tensor representation mainly focuses on the transform aspect. Although several attempts consider using shallow matrix factorization (e.g., singular value decomposition and negative matrix factorization) to characterize the frontal slices of transformed tensor (termed as latent tensor), the faithful characterization aspect is underexplored. To address this issue, we propose a unified Deep Tensor Representation (termed as DTR) framework by synergistically combining the deep latent generative module and the deep transform module. Especially, the deep latent generative module can faithfully generate the latent tensor as compared with shallow matrix factorization. The new DTR framework not only allows us to better understand the classic shallow representations, but also leads us to explore new representation. To examine the representation ability of the proposed DTR, we consider the representative multi-dimensional data recovery task and suggest an unsupervised DTR-based multi-dimensional data recovery model. Extensive experiments demonstrate that DTR achieves superior performance compared to state-of-the-art methods in both quantitative and qualitative aspects, especially for fine details recovery.
Paper Structure (23 sections, 8 equations, 6 figures, 5 tables)

This paper contains 23 sections, 8 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: The recovered pseudo-color images (R:10, G:20, B:30) by shallow characterization-based methods, i.e., t-SVD (based on SVD) zhang2014novel and HLRTF (based on negative matrix factorzation) luo2022hlrtf, and deep characterization-based method, i.e., the proposed DTR.
  • Figure 2: Diagram of our DTR framework. The deep latent generative module $g_\theta(\cdot)$ generates the latent tensor and the deep transform module $f_\xi(\cdot)$ captures the frontal slice relationships of multi-dimensional data.
  • Figure 3: The recovered pseudo-color images by different methods. From top to bottom: MSI Cd (R:10, G:20, B:30), Beads (R:10, G:20, B:30), Feathers (R:31, G:15, B:4), and Flower (R:31, G:15, B:4) with the random missing (SR = 0.1).
  • Figure 4: The 10-th frame of the recovered results by all comparative methods on videos Sunflower and Bird with the random missing (SR = 0.3).
  • Figure 5: The recovered pseudo-color images by different methods. From top to bottom: MSI Cd (R:10, G:20, B:30), Beads (R:10, G:20, B:30), Feathers (R:31, G:15, B:4), and Flowers (R:31, G:15, B:4) with the tube missing (SR = 0.3).
  • ...and 1 more figures

Theorems & Definitions (5)

  • Definition 1: Mode-3 Unfolding doi:10.1137/07070111X
  • Definition 2: Mode-3 Tensor-Matrix Product doi:10.1137/07070111X
  • Definition 3: Tensor Tubal-Rank kilmer2013third
  • Definition 4: T-Product kilmer2013third
  • Remark 1