Table of Contents
Fetching ...

GMT: Guided Mask Transformer for Leaf Instance Segmentation

Feng Chen, Sotirios A. Tsaftaris, Mario Valerio Giuffrida

TL;DR

Leaf instance segmentation in plants is hard due to small, diverse, and occluded leaves. GMT addresses this by injecting leaf spatial priors through harmonic guide functions into a Transformer-based segmentor, via Guided Positional Encoding, Guided Embedding Fusion, and Guided Dynamic Positional Queries, learned with a dedicated auxiliary loss. It achieves state-of-the-art performance on CVPPP LSC, MSU-PID, and KOMATSUNA, with pronounced gains for small and overlapped leaves and robust ablations validating its components. The approach promises improved plant phenotyping capabilities by enabling more accurate and reliable leaf delineation in challenging scenes.

Abstract

Leaf instance segmentation is a challenging multi-instance segmentation task, aiming to separate and delineate each leaf in an image of a plant. Accurate segmentation of each leaf is crucial for plant-related applications such as the fine-grained monitoring of plant growth and crop yield estimation. This task is challenging because of the high similarity (in shape and colour), great size variation, and heavy occlusions among leaf instances. Furthermore, the typically small size of annotated leaf datasets makes it more difficult to learn the distinctive features needed for precise segmentation. We hypothesise that the key to overcoming the these challenges lies in the specific spatial patterns of leaf distribution. In this paper, we propose the Guided Mask Transformer (GMT), which leverages and integrates leaf spatial distribution priors into a Transformer-based segmentor. These spatial priors are embedded in a set of guide functions that map leaves at different positions into a more separable embedding space. Our GMT consistently outperforms the state-of-the-art on three public plant datasets. Our code is available at https://github.com/vios-s/gmt-leaf-ins-seg.

GMT: Guided Mask Transformer for Leaf Instance Segmentation

TL;DR

Leaf instance segmentation in plants is hard due to small, diverse, and occluded leaves. GMT addresses this by injecting leaf spatial priors through harmonic guide functions into a Transformer-based segmentor, via Guided Positional Encoding, Guided Embedding Fusion, and Guided Dynamic Positional Queries, learned with a dedicated auxiliary loss. It achieves state-of-the-art performance on CVPPP LSC, MSU-PID, and KOMATSUNA, with pronounced gains for small and overlapped leaves and robust ablations validating its components. The approach promises improved plant phenotyping capabilities by enabling more accurate and reliable leaf delineation in challenging scenes.

Abstract

Leaf instance segmentation is a challenging multi-instance segmentation task, aiming to separate and delineate each leaf in an image of a plant. Accurate segmentation of each leaf is crucial for plant-related applications such as the fine-grained monitoring of plant growth and crop yield estimation. This task is challenging because of the high similarity (in shape and colour), great size variation, and heavy occlusions among leaf instances. Furthermore, the typically small size of annotated leaf datasets makes it more difficult to learn the distinctive features needed for precise segmentation. We hypothesise that the key to overcoming the these challenges lies in the specific spatial patterns of leaf distribution. In this paper, we propose the Guided Mask Transformer (GMT), which leverages and integrates leaf spatial distribution priors into a Transformer-based segmentor. These spatial priors are embedded in a set of guide functions that map leaves at different positions into a more separable embedding space. Our GMT consistently outperforms the state-of-the-art on three public plant datasets. Our code is available at https://github.com/vios-s/gmt-leaf-ins-seg.
Paper Structure (15 sections, 16 equations, 6 figures, 4 tables)

This paper contains 15 sections, 16 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Examples of leaf instance segmentation. Each row presents an example from a different dataset, showing the original image, ground-truth label, Mask2Former cheng2022masked prediction, and our proposed GMT segmentation result, from left to right.
  • Figure 2: An illustration of the Guided Mask Transformer (GMT). The key components of GMT are Guided Positional Encoding (GPE), Guided Embedding Fusion Model (GEFM), and Guided Dynamic Positional Queries (GDPQ). These components enable an effective integration with the guide functions carrying prior knowledge on instances' distribution. (Best viewed in colour.)
  • Figure 3: Auxiliary supervision at GEFM. The ground-truth (GT) instance masks are encoded by the trained guide functions to produce the GT guided embeddings. The guided features, which are projected from the final output of pixel decoder, are supervised by these embeddings with an $L_1$ loss.
  • Figure 4: GDPQ Module. The positional queries at current Transformer block $Q^{t}_{p}$ is dynamically generated on the guide-function-encoded mask predictions from last block. $Q^{t-1}$ denotes the object queries from last Transformer block. The dimensions of different elements are shown: $n$ is the length of object queries, and $h$ and $w$ denote the spatial dimension of the final pixel features. The computation process of $\mathcal{E}(\mathcal{S}^{t-1}; \Psi)$ as presented in \ref{['eq:gdpq']} is highlighted in red.
  • Figure 5: Qualitative results of CVPPP LSC validation set, MSU-PID and KOMATSUNA test sets. The SBD (as the major segmentation metric) of each model prediction is also displayed.
  • ...and 1 more figures