Table of Contents
Fetching ...

Learning Disentangled Representations for Perceptual Point Cloud Quality Assessment via Mutual Information Minimization

Ziyu Shan, Yujie Zhang, Yipeng Liu, Yiling Xu

TL;DR

The proposed DisPA is a novel disentangled representation learning framework for NR-PCQA that outperforms state-of-the-art methods on multiple PCQA datasets and utilizes an MI estimator to estimate the tight upper bound of the actual MI and further minimize it to achieve explicit representation disentanglement.

Abstract

No-Reference Point Cloud Quality Assessment (NR-PCQA) aims to objectively assess the human perceptual quality of point clouds without relying on pristine-quality point clouds for reference. It is becoming increasingly significant with the rapid advancement of immersive media applications such as virtual reality (VR) and augmented reality (AR). However, current NR-PCQA models attempt to indiscriminately learn point cloud content and distortion representations within a single network, overlooking their distinct contributions to quality information. To address this issue, we propose DisPA, a novel disentangled representation learning framework for NR-PCQA. The framework trains a dual-branch disentanglement network to minimize mutual information (MI) between representations of point cloud content and distortion. Specifically, to fully disentangle representations, the two branches adopt different philosophies: the content-aware encoder is pretrained by a masked auto-encoding strategy, which can allow the encoder to capture semantic information from rendered images of distorted point clouds; the distortion-aware encoder takes a mini-patch map as input, which forces the encoder to focus on low-level distortion patterns. Furthermore, we utilize an MI estimator to estimate the tight upper bound of the actual MI and further minimize it to achieve explicit representation disentanglement. Extensive experimental results demonstrate that DisPA outperforms state-of-the-art methods on multiple PCQA datasets.

Learning Disentangled Representations for Perceptual Point Cloud Quality Assessment via Mutual Information Minimization

TL;DR

The proposed DisPA is a novel disentangled representation learning framework for NR-PCQA that outperforms state-of-the-art methods on multiple PCQA datasets and utilizes an MI estimator to estimate the tight upper bound of the actual MI and further minimize it to achieve explicit representation disentanglement.

Abstract

No-Reference Point Cloud Quality Assessment (NR-PCQA) aims to objectively assess the human perceptual quality of point clouds without relying on pristine-quality point clouds for reference. It is becoming increasingly significant with the rapid advancement of immersive media applications such as virtual reality (VR) and augmented reality (AR). However, current NR-PCQA models attempt to indiscriminately learn point cloud content and distortion representations within a single network, overlooking their distinct contributions to quality information. To address this issue, we propose DisPA, a novel disentangled representation learning framework for NR-PCQA. The framework trains a dual-branch disentanglement network to minimize mutual information (MI) between representations of point cloud content and distortion. Specifically, to fully disentangle representations, the two branches adopt different philosophies: the content-aware encoder is pretrained by a masked auto-encoding strategy, which can allow the encoder to capture semantic information from rendered images of distorted point clouds; the distortion-aware encoder takes a mini-patch map as input, which forces the encoder to focus on low-level distortion patterns. Furthermore, we utilize an MI estimator to estimate the tight upper bound of the actual MI and further minimize it to achieve explicit representation disentanglement. Extensive experimental results demonstrate that DisPA outperforms state-of-the-art methods on multiple PCQA datasets.

Paper Structure

This paper contains 34 sections, 15 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: Statistics of SJTU-PCQA (part) yang2020predicting and predicted quality scores of NR-PCQA models (PQA-Net liu2021pqa and GPA-Net shan2023gpa). Quality scores of different distortion types are in lines of different colors. Red circles are to highlight the score span of different contents with the same distortion.
  • Figure 2: Architecture of proposed DisPA (a). Our DisPA consists of two encoders $\mathcal{F}$ and $\mathcal{G}$ for learning content-aware and distortion-aware representations, and an MI estimator $\mathcal{M}$. The content-aware encoder $\mathcal{F}$ is pretrained using masked autoencoding (b). "$\bigoplus$" denotes concatenation.
  • Figure 3: Illustration of mini-patch map generation.
  • Figure 4: Statistical Analysis of SJTU-PCQA (part) and predicted quality scores of DisPA.
  • Figure 5: Qualitative Evaluation of NR-PCQA methods (PQA-Net liu2021pqa, CoPA shan2024contrastive and DisPA) on SJTU-PCQA yang2020predicting and WPC liu2022perceptual. Figure (b)-(d) share the same distortion pattern (i.e., color noise), same for (f)-(h) (i.e., downsampling). "GT" denotes ground truth.
  • ...and 1 more figures