Table of Contents
Fetching ...

Learning Photometric Feature Transform for Free-form Object Scan

Xiang Feng, Kaizhang Kang, Fan Pei, Huakeng Ding, Jinjiang You, Ping Tan, Kun Zhou, Hongzhi Wu

TL;DR

This work presents a data-driven framework for free-form object scanning that learns to aggregate and transform photometric measurements from unstructured views into view-invariant features to feed a multi-view stereo pipeline. By jointly optimizing illumination patterns and a Feature Transform Network (FTN) on synthetic data, the method enables simultaneous reconstruction of geometry and anisotropic reflectance, even with handheld acquisition setups. The FTN operates as a modular preprocessing stage that can boost existing MVS/backends, demonstrated on a handheld LED-array scanner and an iPad tablet, with competitive geometry and SVBRDF results against professional scanners and state-of-the-art methods. The approach leverages active angular probing to enhance sampling of complex appearance, yielding practical gains for free-form object digitization.

Abstract

We propose a novel framework to automatically learn to aggregate and transform photometric measurements from multiple unstructured views into spatially distinctive and view-invariant low-level features, which are subsequently fed to a multi-view stereo pipeline to enhance 3D reconstruction. The illumination conditions during acquisition and the feature transform are jointly trained on a large amount of synthetic data. We further build a system to reconstruct both the geometry and anisotropic reflectance of a variety of challenging objects from hand-held scans. The effectiveness of the system is demonstrated with a lightweight prototype, consisting of a camera and an array of LEDs, as well as an off-the-shelf tablet. Our results are validated against reconstructions from a professional 3D scanner and photographs, and compare favorably with state-of-the-art techniques.

Learning Photometric Feature Transform for Free-form Object Scan

TL;DR

This work presents a data-driven framework for free-form object scanning that learns to aggregate and transform photometric measurements from unstructured views into view-invariant features to feed a multi-view stereo pipeline. By jointly optimizing illumination patterns and a Feature Transform Network (FTN) on synthetic data, the method enables simultaneous reconstruction of geometry and anisotropic reflectance, even with handheld acquisition setups. The FTN operates as a modular preprocessing stage that can boost existing MVS/backends, demonstrated on a handheld LED-array scanner and an iPad tablet, with competitive geometry and SVBRDF results against professional scanners and state-of-the-art methods. The approach leverages active angular probing to enhance sampling of complex appearance, yielding practical gains for free-form object digitization.

Abstract

We propose a novel framework to automatically learn to aggregate and transform photometric measurements from multiple unstructured views into spatially distinctive and view-invariant low-level features, which are subsequently fed to a multi-view stereo pipeline to enhance 3D reconstruction. The illumination conditions during acquisition and the feature transform are jointly trained on a large amount of synthetic data. We further build a system to reconstruct both the geometry and anisotropic reflectance of a variety of challenging objects from hand-held scans. The effectiveness of the system is demonstrated with a lightweight prototype, consisting of a camera and an array of LEDs, as well as an off-the-shelf tablet. Our results are validated against reconstructions from a professional 3D scanner and photographs, and compare favorably with state-of-the-art techniques.
Paper Structure (25 sections, 7 equations, 66 figures, 1 table)

This paper contains 25 sections, 7 equations, 66 figures, 1 table.

Figures (66)

  • Figure 1: Using an illumination-multiplexing device, such as a lightweight prototype consisting of a single camera and a programmable light array (a) or an off-the-shelf tablet (b), we propose a system that learns to acquire with pre-optimized time-varying lighting patterns (the bottom insets in (a) and (b)) at unstructured views, and reconstruct both the geometry (c) and complex anisotropic reflectance (d) of a number of challenging objects. Please refer to the supplementary video for animated rendering results.
  • Figure 3: Our runtime pipeline. First, we partition continuously captured images into groups of 5, each acquired under a different lighting pattern. Next, we crop patches from each image in the group, centered at a same pixel location. A network (Feature Transform Network) then transforms these data into a per-pixel high-dimensional feature at that location, the collection of which forms a feature map for the center view. We feed the feature maps from every group into a multi-view stereo pipeline for 3D reconstruction. With the computed shape, the appearance of the object is differentiably optimized with respect to all input images, and then stored as texture maps of GGX BRDF parameters. FTN. $=$ Feature Transform Network.
  • Figure 4: Our network (a) and warping illustration (b). The network (a) takes as input the lumitexels corresponding to all pixels in 5 patches from a group, and encodes them as measurements by simulating lighting pattern projections. The measurements of each view are warped to their respective measurement volume (b), defined at the same center view. A total of 5 volumes are aggregated and transformed to a feature vector, by combining the outputs from an unnormalized and a normalized network branch which share the same architecture. A 2D warping example of a patch at the i-th view is shown in (b). First, a measurement volume is set up with respect to the center view. We then fill each voxel by projecting its center to the patch, and fetching the corresponding image measurement. The acquisition condition, including view and lighting information, is also stored. Meas. = measurement, Acq. = Acquisition, Cond. = condition.
  • Figure 5: Network architecture. Our network takes as input the lumitexels corresponding to all pixels in 5 patches from a group, and encodes them as measurements by simulating lighting pattern projections. The measurements of each view are warped to a measurement volume. The total of 5 volumes are aggregated and transformed to a feature vector, by combining the outputs from an unnormalized and a normalized branch that shares the same structure. The dimension of data is specified in the corresponding block. In the feature transform module, the dimension of depth is additionally specified on the top of each block. In the aggregation module, we loop over each of 5 iso-depth slices within 5 measurement volumes and transform them into a 20D intermediate feature. In the end, all intermediate features at 128 different depth hypotheses are aggregated to a final 12D feature.
  • Figure 6: Our training dataset of 15 high-quality objects, digitized by a commercial 3D scanner and a professional light stageKang:2019:JOINT.
  • ...and 61 more figures