Table of Contents
Fetching ...

PCF-Lift: Panoptic Lifting by Probabilistic Contrastive Fusion

Runsong Zhu, Shi Qiu, Qianyi Wu, Ka-Hei Hui, Pheng-Ann Heng, Chi-Wing Fu

TL;DR

A new pipeline coined PCF-Lift is designed based on the Probabilis-tic Contrastive Fusion (PCF) to learn and embed probabilistic features throughout the authors' pipeline to actively consider inaccurate segmentations and inconsistent instance IDs and provides a theoretical analysis to justify the superiority of the proposed probabilistic solution.

Abstract

Panoptic lifting is an effective technique to address the 3D panoptic segmentation task by unprojecting 2D panoptic segmentations from multi-views to 3D scene. However, the quality of its results largely depends on the 2D segmentations, which could be noisy and error-prone, so its performance often drops significantly for complex scenes. In this work, we design a new pipeline coined PCF-Lift based on our Probabilis-tic Contrastive Fusion (PCF) to learn and embed probabilistic features throughout our pipeline to actively consider inaccurate segmentations and inconsistent instance IDs. Technical-wise, we first model the probabilistic feature embeddings through multivariate Gaussian distributions. To fuse the probabilistic features, we incorporate the probability product kernel into the contrastive loss formulation and design a cross-view constraint to enhance the feature consistency across different views. For the inference, we introduce a new probabilistic clustering method to effectively associate prototype features with the underlying 3D object instances for the generation of consistent panoptic segmentation results. Further, we provide a theoretical analysis to justify the superiority of the proposed probabilistic solution. By conducting extensive experiments, our PCF-lift not only significantly outperforms the state-of-the-art methods on widely used benchmarks including the ScanNet dataset and the challenging Messy Room dataset (4.4% improvement of scene-level PQ), but also demonstrates strong robustness when incorporating various 2D segmentation models or different levels of hand-crafted noise.

PCF-Lift: Panoptic Lifting by Probabilistic Contrastive Fusion

TL;DR

A new pipeline coined PCF-Lift is designed based on the Probabilis-tic Contrastive Fusion (PCF) to learn and embed probabilistic features throughout the authors' pipeline to actively consider inaccurate segmentations and inconsistent instance IDs and provides a theoretical analysis to justify the superiority of the proposed probabilistic solution.

Abstract

Panoptic lifting is an effective technique to address the 3D panoptic segmentation task by unprojecting 2D panoptic segmentations from multi-views to 3D scene. However, the quality of its results largely depends on the 2D segmentations, which could be noisy and error-prone, so its performance often drops significantly for complex scenes. In this work, we design a new pipeline coined PCF-Lift based on our Probabilis-tic Contrastive Fusion (PCF) to learn and embed probabilistic features throughout our pipeline to actively consider inaccurate segmentations and inconsistent instance IDs. Technical-wise, we first model the probabilistic feature embeddings through multivariate Gaussian distributions. To fuse the probabilistic features, we incorporate the probability product kernel into the contrastive loss formulation and design a cross-view constraint to enhance the feature consistency across different views. For the inference, we introduce a new probabilistic clustering method to effectively associate prototype features with the underlying 3D object instances for the generation of consistent panoptic segmentation results. Further, we provide a theoretical analysis to justify the superiority of the proposed probabilistic solution. By conducting extensive experiments, our PCF-lift not only significantly outperforms the state-of-the-art methods on widely used benchmarks including the ScanNet dataset and the challenging Messy Room dataset (4.4% improvement of scene-level PQ), but also demonstrates strong robustness when incorporating various 2D segmentation models or different levels of hand-crafted noise.

Paper Structure

This paper contains 31 sections, 1 theorem, 6 equations, 6 figures, 3 tables, 1 algorithm.

Key Result

Corollary 1

If the covariances of the given Gaussian distributions are isotropic and fixed, i.e., $\Sigma_{i}= \Sigma_{j}= \sigma \textbf{I}$, where $\sigma$ is a constant scalar, the probability product kernel can be simplified to an RBF kernel.

Figures (6)

  • Figure 1: Our PCF-Lift method unprojects 2D panoptic segmentation predictions to 3D domain, facilitating the generation of consistent panoptic segmentation masks. For simplicity and clarity, we highlight instance segmentation masks.
  • Figure 2: Overview of PCF-Lift. The 3D panoptic fields include a semantic field, an instance field, a density field, and a color field. To solve the instance-related issues, we propose to learn probabilistic feature embeddings in the instance field (see Sec. \ref{['sec:embedding']}). During the training phase, given two camera views, we can render the probabilistic feature maps from the instance field via volume rendering. To optimize the probabilistic instance field, we devise the probabilistic contrastive loss with Probability Product (PP) kernel jebara2004probability, and propose a cross-view constraint to further enhance the feature consistency from different views (see Sec. \ref{['sec:contrast-fusion']}). Similarly, we can render the semantic and color predictions, and adopt photometric loss and cross-entropy loss to optimize the semantic field, the density field, and the color field. During the inference phase, we design a novel multi-view object association (MVOA) algorithm for the generation of consistent panoptic segmentations (see Sec. \ref{['sec:inference']}).
  • Figure 3: (a) Flexibility of adjusting covariances. Contour plot of the PP kernel similarity for two Gaussians with different covariance values ($\sigma_{1}$ and $\sigma_{2}$) and fixed mean values in a 1-dimensional case. (b) Anisotropy. Contour plot of the PP kernel similarity for two Gaussians with different Gaussian mean offsets ($d_{x}$ and $d_{y}$) and fixed covariances in a 2-dimensional case. (c) Isotropy. Contour plot of the RBF kernel similarity for two deterministic features with different offsets ($d_{x}$ and $d_{y}$) in a 2-dimensional case.
  • Figure 4: Visual comparison of the latest state-of-the-art method Contrastive Lift bhalgat2023contrastive and our method on the ScanNet dai2017scannet dataset and the Messy Room bhalgat2023contrastive dataset.
  • Figure 5: The visualization results of learned covariance components and the statistical results of covariances in two scenes of the Messy Room dataset bhalgat2023contrastive. For the histograms, the horizontal axis denotes the range of covariance magnitudes, while the vertical axis corresponds to the frequency statistics for those magnitudes. We calculate and plot the covariance magnitudes ($\sigma^{(1)^{2}}*\sigma^{(2)^{2}}*\sigma^{(3)^{2}}$) of two distance image regions, the boundary areas and the internal areas of object instances, across all observed views.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Corollary 1