Table of Contents
Fetching ...

Enhancing Agricultural Environment Perception via Active Vision and Zero-Shot Learning

Michele Carlo La Greca, Mirko Usuelli, Matteo Matteucci

TL;DR

This work involves leveraging Active Vision techniques and ZeroShot Learning to improve the robot's ability to perceive and interact with agricultural environment in the context of fruit harvesting, outperforming traditional and static predefined planning methods.

Abstract

Agriculture, fundamental for human sustenance, faces unprecedented challenges. The need for efficient, human-cooperative, and sustainable farming methods has never been greater. The core contributions of this work involve leveraging Active Vision (AV) techniques and Zero-Shot Learning (ZSL) to improve the robot's ability to perceive and interact with agricultural environment in the context of fruit harvesting. The AV Pipeline implemented within ROS 2 integrates the Next-Best View (NBV) Planning for 3D environment reconstruction through a dynamic 3D Occupancy Map. Our system allows the robotics arm to dynamically plan and move to the most informative viewpoints and explore the environment, updating the 3D reconstruction using semantic information produced through ZSL models. Simulation and real-world experimental results demonstrate our system's effectiveness in complex visibility conditions, outperforming traditional and static predefined planning methods. ZSL segmentation models employed, such as YOLO World + EfficientViT SAM, exhibit high-speed performance and accurate segmentation, allowing flexibility when dealing with semantic information in unknown agricultural contexts without requiring any fine-tuning process.

Enhancing Agricultural Environment Perception via Active Vision and Zero-Shot Learning

TL;DR

This work involves leveraging Active Vision techniques and ZeroShot Learning to improve the robot's ability to perceive and interact with agricultural environment in the context of fruit harvesting, outperforming traditional and static predefined planning methods.

Abstract

Agriculture, fundamental for human sustenance, faces unprecedented challenges. The need for efficient, human-cooperative, and sustainable farming methods has never been greater. The core contributions of this work involve leveraging Active Vision (AV) techniques and Zero-Shot Learning (ZSL) to improve the robot's ability to perceive and interact with agricultural environment in the context of fruit harvesting. The AV Pipeline implemented within ROS 2 integrates the Next-Best View (NBV) Planning for 3D environment reconstruction through a dynamic 3D Occupancy Map. Our system allows the robotics arm to dynamically plan and move to the most informative viewpoints and explore the environment, updating the 3D reconstruction using semantic information produced through ZSL models. Simulation and real-world experimental results demonstrate our system's effectiveness in complex visibility conditions, outperforming traditional and static predefined planning methods. ZSL segmentation models employed, such as YOLO World + EfficientViT SAM, exhibit high-speed performance and accurate segmentation, allowing flexibility when dealing with semantic information in unknown agricultural contexts without requiring any fine-tuning process.
Paper Structure (19 sections, 2 equations, 3 figures, 3 tables)

This paper contains 19 sections, 2 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 2: The diagram shows a 6-DoF robotic arm with a camera mounted on top. The system explores the agricultural environment by retrieving state data from the Robot Block to the Active Vision Pipeline Block. This pipeline performs ZSL segmentation via the Segmentation Server Block and updates the Semantic 3D Occupancy Map. Following the update, the ray-casting utility optimization is activated to generate the NBV, which is used for the final loop closing control of the robot.
  • Figure 3: Qualitative comparison of exploration planning convergence across four scenarios of a simulated tomato plant.
  • Figure 4: Real-world visual comparison regarding unoriented start conditions.