Table of Contents
Fetching ...

Learning to Infer Parameterized Representations of Plants from 3D Scans

Samara Ghrer, Christophe Godin, Stefanie Wuhrer

TL;DR

The paper tackles automatic extraction of a parameterized plant architecture from 3D scans by learning a latent space of L-Strings (binary axial-tree representations) via recursive auto-encoders trained on synthetic plants. A PointNet-based encoder maps input point clouds into this latent space, enabling direct inference of a complete parametric representation that supports 3D reconstruction, skeletonization, and segmentation, with test-time optimization to align reconstructions to observed data. Key contributions include a data-driven shape space for 3D plants learned from synthetic data, a binary-tree L-String encoding, and demonstrated generalization to real Chenopodium album scans while maintaining competitive performance with strong baselines. The approach yields compact representations that unify multiple phenotyping tasks, offering practical impact for plant phenotyping and virtual-plant applications.

Abstract

Plants frequently contain numerous organs, organized in 3D branching systems defining the plant's architecture. Reconstructing the architecture of plants from unstructured observations is challenging because of self-occlusion and spatial proximity between organs, which are often thin structures. To achieve the challenging task, we propose an approach that allows to infer a parameterized representation of the plant's architecture from a given 3D scan of a plant. In addition to the plant's branching structure, this representation contains parametric information for each plant organ, and can therefore be used directly in a variety of tasks. In this data-driven approach, we train a recursive neural network with virtual plants generated using a procedural model. After training, the network allows to infer a parametric tree-like representation based on an input 3D point cloud. Our method is applicable to any plant that can be represented as binary axial tree. We quantitatively evaluate our approach on Chenopodium Album plants on reconstruction, segmentation and skeletonization, which are important problems in plant phenotyping. In addition to carrying out several tasks at once, our method achieves results on-par with strong baselines for each task. We apply our method, trained exclusively on synthetic data, to 3D scans and show that it generalizes well.

Learning to Infer Parameterized Representations of Plants from 3D Scans

TL;DR

The paper tackles automatic extraction of a parameterized plant architecture from 3D scans by learning a latent space of L-Strings (binary axial-tree representations) via recursive auto-encoders trained on synthetic plants. A PointNet-based encoder maps input point clouds into this latent space, enabling direct inference of a complete parametric representation that supports 3D reconstruction, skeletonization, and segmentation, with test-time optimization to align reconstructions to observed data. Key contributions include a data-driven shape space for 3D plants learned from synthetic data, a binary-tree L-String encoding, and demonstrated generalization to real Chenopodium album scans while maintaining competitive performance with strong baselines. The approach yields compact representations that unify multiple phenotyping tasks, offering practical impact for plant phenotyping and virtual-plant applications.

Abstract

Plants frequently contain numerous organs, organized in 3D branching systems defining the plant's architecture. Reconstructing the architecture of plants from unstructured observations is challenging because of self-occlusion and spatial proximity between organs, which are often thin structures. To achieve the challenging task, we propose an approach that allows to infer a parameterized representation of the plant's architecture from a given 3D scan of a plant. In addition to the plant's branching structure, this representation contains parametric information for each plant organ, and can therefore be used directly in a variety of tasks. In this data-driven approach, we train a recursive neural network with virtual plants generated using a procedural model. After training, the network allows to infer a parametric tree-like representation based on an input 3D point cloud. Our method is applicable to any plant that can be represented as binary axial tree. We quantitatively evaluate our approach on Chenopodium Album plants on reconstruction, segmentation and skeletonization, which are important problems in plant phenotyping. In addition to carrying out several tasks at once, our method achieves results on-par with strong baselines for each task. We apply our method, trained exclusively on synthetic data, to 3D scans and show that it generalizes well.

Paper Structure

This paper contains 33 sections, 3 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Our method takes a 3D point cloud of a plant as input, and outputs a parameterized representation of the plant. This representation encodes the plant's branching structure and geometry along with semantic information such as organ type, and allows for multiple tasks including reconstruction, organ segmentation and skeleton extraction.
  • Figure 2: Overview: our method learns a latent space $\mathcal{S}$, that allows the mapping of 3D point clouds to L-Strings. At inference, the point cloud is mapped to $\mathcal{S}$ using the point cloud encoder on the left, and the resulting latent point allows to reconstruct the corresponding L-String using the L-String decoder on the right.
  • Figure 3: The two criteria to merge/split points in latent space shown on an example tree. Merging is applied recursively in a bottom-up manner until the whole tree is merged into one point, while the splitting performs the inverse operation.
  • Figure 4: The network architectures used for the recursive encoder and decoder. A binary tree structure is recursively encoded into latent space $\mathcal{S}$. Latent points $s_1 \in dim_{\mathcal{S}}$ and $s_2 \in dim_{\mathcal{S}}$ are merged into $s_{1,2} \in dim_{\mathcal{S}}$ by the encoder. Symmetrically, the decoder splits $s_{1,2}$ into two latent points $s_1$ and $s_2$. $dim_{h}$ denotes the dimension of the hidden layer.
  • Figure 5: Comparison to SIREN sitzmann2020implicitneuralrepresentationsperiodic for 3D reconstruction.
  • ...and 6 more figures