Table of Contents
Fetching ...

SG-DOR: Learning Scene Graphs with Direction-Conditioned Occlusion Reasoning for Pepper Plants

Rohit Menon, Niklas Mueller-Goldingen, Sicong Pan, Gokul Krishna Chenchani, Maren Bennewitz

TL;DR

This work presents SG-DOR (Scene Graphs with Direction-Conditioned Occlusion Reasoning), a relational framework that, given instance-segmented organ point clouds, infers a scene graph encoding physical attachments and direction-conditioned occlusion.

Abstract

Robotic harvesting in dense crop canopies requires effective interventions that depend not only on geometry, but also on explicit, direction-conditioned relations identifying which organs obstruct a target fruit. We present SG-DOR (Scene Graphs with Direction-Conditioned Occlusion Reasoning), a relational framework that, given instance-segmented organ point clouds, infers a scene graph encoding physical attachments and direction-conditioned occlusion. We introduce an occlusion ranking task for retrieving and ranking candidate leaves for a target fruit and approach direction, and propose a direction-aware graph neural architecture with per-fruit leaf-set attention and union-level aggregation. Experiments on a multi-plant synthetic pepper dataset show improved occlusion prediction (F1=0.73, NDCG@3=0.85) and attachment inference (edge F1=0.83) over strong ablations, yielding a structured relational signal for downstream intervention planning.

SG-DOR: Learning Scene Graphs with Direction-Conditioned Occlusion Reasoning for Pepper Plants

TL;DR

This work presents SG-DOR (Scene Graphs with Direction-Conditioned Occlusion Reasoning), a relational framework that, given instance-segmented organ point clouds, infers a scene graph encoding physical attachments and direction-conditioned occlusion.

Abstract

Robotic harvesting in dense crop canopies requires effective interventions that depend not only on geometry, but also on explicit, direction-conditioned relations identifying which organs obstruct a target fruit. We present SG-DOR (Scene Graphs with Direction-Conditioned Occlusion Reasoning), a relational framework that, given instance-segmented organ point clouds, infers a scene graph encoding physical attachments and direction-conditioned occlusion. We introduce an occlusion ranking task for retrieving and ranking candidate leaves for a target fruit and approach direction, and propose a direction-aware graph neural architecture with per-fruit leaf-set attention and union-level aggregation. Experiments on a multi-plant synthetic pepper dataset show improved occlusion prediction (F1=0.73, NDCG@3=0.85) and attachment inference (edge F1=0.83) over strong ablations, yielding a structured relational signal for downstream intervention planning.
Paper Structure (37 sections, 24 equations, 5 figures, 3 tables)

This paper contains 37 sections, 24 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Overview of SG-DOR. Left: Instance-segmented 3D sweet pepper plants. Center: The SG-DOR network performs attention-based relational reasoning over the induced graph representation. Right: Inferred scene graph encoding structural attachments and direction-conditioned occlusion ranking between plant organs.
  • Figure 2: Geometric occlusion modeling via depth comparison.L1 and L2 are candidate leaves occluding Fruit f, with L1 closer to the fruit and its potential occlusion score along the +x-axis being 0.4, while that of L2 is 0.3. Due to overlapping of L1 and L2, their union occlusion score along + x-axis is 0.5.
  • Figure 3: SG-DOR Learning Architecture.Instance-level point sets are encoded by a PointNet++ module to obtain identity embeddings, from which a candidate graph is constructed and refined using a residual GINE backbone with edge features. Identity and relational features are concatenated and fed to joint multi-task heads for semantic, geometric, and structural predictions. An additional attention-based occlusion module, shown in detail in Fig. \ref{['fig:occl_attn']}, performs direction-conditioned leaf–fruit reasoning and is trained with ranking and visibility losses.
  • Figure 4: Cross-attention architecture for occlusion prediction.A direction-specific fruit query $\mathbf{q}_{i,k}$ computes pairwise occlusion potentials $\hat{s}_{ij,k}$ against self-attended, contextualized leaf features. These potentials act as attention weights to aggregate the leaf features into a single context vector $\mathbf{c}_{i,k}$, which is then concatenated with the initial query to predict the visibility reduction $\hat{u}_{i,k}$.
  • Figure 5: