Table of Contents
Fetching ...

3D Multimodal Image Registration for Plant Phenotyping

Eric Stumpe, Gernot Bodner, Francesco Flagiello, Matthias Zeppelzauer

TL;DR

This work tackles pixel-accurate registration across multimodal plant imaging systems by introducing a depth-guided 3D registration pipeline. A 3D canopy mesh derived from a ToF depth map enables ray casting-based pixel correspondence between cameras, while an automated occlusion and uncertainty classification distinguishes legitimate matches from errors. The approach yields registered images and multimodal point clouds across RGBD, thermal, and hyperspectral modalities and scales to any number of cameras, validated on six plant species. This enables robust cross-modal analyses for plant phenotyping tasks such as leaf segmentation, stress assessment, and multi-spectral data fusion, with practical guidance on calibration, hardware, and limitations.

Abstract

The use of multiple camera technologies in a combined multimodal monitoring system for plant phenotyping offers promising benefits. Compared to configurations that only utilize a single camera technology, cross-modal patterns can be recorded that allow a more comprehensive assessment of plant phenotypes. However, the effective utilization of cross-modal patterns is dependent on precise image registration to achieve pixel-accurate alignment, a challenge often complicated by parallax and occlusion effects inherent in plant canopy imaging. In this study, we propose a novel multimodal 3D image registration method that addresses these challenges by integrating depth information from a time-of-flight camera into the registration process. By leveraging depth data, our method mitigates parallax effects and thus facilitates more accurate pixel alignment across camera modalities. Additionally, we introduce an automated mechanism to identify and differentiate different types of occlusions, thereby minimizing the introduction of registration errors. To evaluate the efficacy of our approach, we conduct experiments on a diverse image dataset comprising six distinct plant species with varying leaf geometries. Our results demonstrate the robustness of the proposed registration algorithm, showcasing its ability to achieve accurate alignment across different plant types and camera compositions. Compared to previous methods it is not reliant on detecting plant specific image features and can thereby be utilized for a wide variety of applications in plant sciences. The registration approach principally scales to arbitrary numbers of cameras with different resolutions and wavelengths. Overall, our study contributes to advancing the field of plant phenotyping by offering a robust and reliable solution for multimodal image registration.

3D Multimodal Image Registration for Plant Phenotyping

TL;DR

This work tackles pixel-accurate registration across multimodal plant imaging systems by introducing a depth-guided 3D registration pipeline. A 3D canopy mesh derived from a ToF depth map enables ray casting-based pixel correspondence between cameras, while an automated occlusion and uncertainty classification distinguishes legitimate matches from errors. The approach yields registered images and multimodal point clouds across RGBD, thermal, and hyperspectral modalities and scales to any number of cameras, validated on six plant species. This enables robust cross-modal analyses for plant phenotyping tasks such as leaf segmentation, stress assessment, and multi-spectral data fusion, with practical guidance on calibration, hardware, and limitations.

Abstract

The use of multiple camera technologies in a combined multimodal monitoring system for plant phenotyping offers promising benefits. Compared to configurations that only utilize a single camera technology, cross-modal patterns can be recorded that allow a more comprehensive assessment of plant phenotypes. However, the effective utilization of cross-modal patterns is dependent on precise image registration to achieve pixel-accurate alignment, a challenge often complicated by parallax and occlusion effects inherent in plant canopy imaging. In this study, we propose a novel multimodal 3D image registration method that addresses these challenges by integrating depth information from a time-of-flight camera into the registration process. By leveraging depth data, our method mitigates parallax effects and thus facilitates more accurate pixel alignment across camera modalities. Additionally, we introduce an automated mechanism to identify and differentiate different types of occlusions, thereby minimizing the introduction of registration errors. To evaluate the efficacy of our approach, we conduct experiments on a diverse image dataset comprising six distinct plant species with varying leaf geometries. Our results demonstrate the robustness of the proposed registration algorithm, showcasing its ability to achieve accurate alignment across different plant types and camera compositions. Compared to previous methods it is not reliant on detecting plant specific image features and can thereby be utilized for a wide variety of applications in plant sciences. The registration approach principally scales to arbitrary numbers of cameras with different resolutions and wavelengths. Overall, our study contributes to advancing the field of plant phenotyping by offering a robust and reliable solution for multimodal image registration.
Paper Structure (56 sections, 3 equations, 13 figures, 4 tables)

This paper contains 56 sections, 3 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: Pixel alignment after registration. Our method enables to register multiple images from different types of cameras and enables the combined processing of multimodal information at pixel-level.
  • Figure 2: Visualization of our 3D multimodal registration approach covering the entire registration pipeline.
  • Figure 3: 3D projection principle. Geometrical example for projecting pixel information for registration. $C_D$ is the depth camera, while $C_T$ and $C_S$ are arbitrary cameras at different positions that represent the target view and source view respectively. Image $C_T$ and Image $C_S$ show the images recorded by both cameras of the observed car in 3D space (bottom right). The blue line represents the epipolar line that corresponds to $p(u,v,1)_{C_T}$ being projected into 3D space. The green and orange lines represent the projection lines of $C_T$ and $C_D$ respectively.
  • Figure 4: Schematic visualization of the generation of the 3D mesh from a depthmap for the example of two adjacent leaves. The depthmap distance of each pixel is represented by a color gradient of blue (near) to yellow (far) Each pixel (white dot for example) is converted to a mesh vertex and connected with its 8 neighbors via mesh edges. Edges are omitted if the vertical angle is over 15°, which can occur for pixels at adjacent leaves (red lines).
  • Figure 5: Projection Cases. a) Different cases of projection errors and uncertainties that can occur between different cameras $C_T$ and $C_S$, with $C_D$ being the depth camera: 1) legitimate projections, 2) occlusion error, 3.1) incoming uncertain correspondence, 3.2) uncertain correspondence. The cases 4, 5 and 6 are only relevant from the perspective of the target camera $C_T$. 4) indicates the certain canopy area. 5) represents the uncertain area, while 6) represents the certain background area. b) the uncertainty mesh: The green polygons exemplify a simplified canopy object mesh $\mathcal{M}_o$. The brown area represents the ground plane. By shooting rays from the depth camera $C_D$ through the vertices of border edges of $\mathcal{M}_o$ we find the intersections with the ground plane, which allows us to construct the uncertainty mesh $\mathcal{M}_u$ (blue mesh).
  • ...and 8 more figures