Table of Contents
Fetching ...

Intrinsic Image Decomposition Using Point Cloud Representation

Xiaoyan Xing, Konrad Groh, Sezer Karaoglu, Theo Gevers

TL;DR

This work tackles intrinsic image decomposition (IID) by moving from 2D imagery to 3D point-cloud representations. It introduces PoInt-Net, a point-based network with three specialized components—Point Albedo-Net, Light Direction Estimation Net, and Learnable Shader—to jointly estimate albedo and shading from colored point clouds, using a two-stage training regime. The approach demonstrates strong efficiency (fewer parameters) and zero-shot generalization, achieving state-of-the-art or competitive results across ShapeNet-Intrinsic, MIT-Intrinsic, MPI-Sintel, Inverender, and IIW datasets, even when trained only on single-object scenes. The findings highlight the advantages of point-cloud-based priors for IID, robust performance under noisy depth, and practical applicability to real-world scenes, with limitations noted and paths for future work outlined.

Abstract

The purpose of intrinsic decomposition is to separate an image into its albedo (reflective properties) and shading components (illumination properties). This is challenging because it's an ill-posed problem. Conventional approaches primarily concentrate on 2D imagery and fail to fully exploit the capabilities of 3D data representation. 3D point clouds offer a more comprehensive format for representing scenes, as they combine geometric and color information effectively. To this end, in this paper, we introduce Point Intrinsic Net (PoInt-Net), which leverages 3D point cloud data to concurrently estimate albedo and shading maps. The merits of PoInt-Net include the following aspects. First, the model is efficient, achieving consistent performance across point clouds of any size with training only required on small-scale point clouds. Second, it exhibits remarkable robustness; even when trained exclusively on datasets comprising individual objects, PoInt-Net demonstrates strong generalization to unseen objects and scenes. Third, it delivers superior accuracy over conventional 2D approaches, demonstrating enhanced performance across various metrics on different datasets. (Code Released)

Intrinsic Image Decomposition Using Point Cloud Representation

TL;DR

This work tackles intrinsic image decomposition (IID) by moving from 2D imagery to 3D point-cloud representations. It introduces PoInt-Net, a point-based network with three specialized components—Point Albedo-Net, Light Direction Estimation Net, and Learnable Shader—to jointly estimate albedo and shading from colored point clouds, using a two-stage training regime. The approach demonstrates strong efficiency (fewer parameters) and zero-shot generalization, achieving state-of-the-art or competitive results across ShapeNet-Intrinsic, MIT-Intrinsic, MPI-Sintel, Inverender, and IIW datasets, even when trained only on single-object scenes. The findings highlight the advantages of point-cloud-based priors for IID, robust performance under noisy depth, and practical applicability to real-world scenes, with limitations noted and paths for future work outlined.

Abstract

The purpose of intrinsic decomposition is to separate an image into its albedo (reflective properties) and shading components (illumination properties). This is challenging because it's an ill-posed problem. Conventional approaches primarily concentrate on 2D imagery and fail to fully exploit the capabilities of 3D data representation. 3D point clouds offer a more comprehensive format for representing scenes, as they combine geometric and color information effectively. To this end, in this paper, we introduce Point Intrinsic Net (PoInt-Net), which leverages 3D point cloud data to concurrently estimate albedo and shading maps. The merits of PoInt-Net include the following aspects. First, the model is efficient, achieving consistent performance across point clouds of any size with training only required on small-scale point clouds. Second, it exhibits remarkable robustness; even when trained exclusively on datasets comprising individual objects, PoInt-Net demonstrates strong generalization to unseen objects and scenes. Third, it delivers superior accuracy over conventional 2D approaches, demonstrating enhanced performance across various metrics on different datasets. (Code Released)
Paper Structure (49 sections, 10 equations, 22 figures, 3 tables)

This paper contains 49 sections, 10 equations, 22 figures, 3 tables.

Figures (22)

  • Figure 1: Intrinsic image decomposition using a 3D point cloud representation. Our approach decomposes the intrinsic components of an object/scene based on the point cloud representation of its appearance from a particular viewing angle. Point clouds are generated from RGB-D images, where the depth maps are obtained by a depth camera (e.g. Lidar or ToF) or are estimated by a monocular depth estimation method such as Ranftl2022.
  • Figure 2: Our proposed framework for intrinsic point cloud decomposition starts by transforming the RGB-D representation into a point cloud representation. (a) The point cloud representation is used as input to train two separate components: the shading and the albedo estimations. The shading estimation is supported by the DirectionNet (Light Direction Estimation Net), which takes (a) as input and and outputs surface light direction estimates (c). Surface normals (d) are calculated using local neighborhoods within (a). The Shader (Learnable Shader) then uses the concatenated vectors of (c) and (d) to generate the final shading estimation (e). The albedo estimation is obtained by the AlbedoNet (Point-Albedo Net) which extracts invariant reflectance (b) from (a) based on the Lambertian assumption. Finally, by multiplying (b) and (e), the reconstructed image (f) is generated. Please refer to the supplementary for a more detailed explanation.
  • Figure 3: Comparison to state-of-the-art method USI3D (fine-tuned version) liu2020cvpr and ablation study on the ShapeNet-intrinsic dataset. Zoom to see details.
  • Figure 4: Results and ablation study for ShapeNet-Intrinsic janner2017self.
  • Figure 5: Qualitative results on the MIT-intrinsic benchmark grosse2009ground. Comparison to the state-of-the-art method PIE-Net das2022pie. Ablation study on shader is conducted.
  • ...and 17 more figures