Table of Contents
Fetching ...

Language-Guided Object Search in Agricultural Environments

Advaith Balaji, Saket Pradhan, Dmitry Berenson

TL;DR

This work tackles object search in loosely semantically organized agricultural environments by proposing LOSAE, a language-guided approach that reasons about unseen targets using only seen objects and an LLM. LOSAE performs environment exploration to build an object memory, uses an LLM to compute probabilistic location of the unseen target, and plans a waypoint-based path that balances distance with semantic affinity, optionally grasping the target. Real-world experiments on a Boston Dynamics Spot achieve a robust 80% success rate and a 0.67 SPL, with offline reasoning yielding about 84% path efficiency relative to an ideal path. The results demonstrate that language-based reasoning over object-to-object relationships can effectively guide search in unstructured farm settings, highlighting potential for scalable, low-cost deployment in agricultural robotics.

Abstract

Creating robots that can assist in farms and gardens can help reduce the mental and physical workload experienced by farm workers. We tackle the problem of object search in a farm environment, providing a method that allows a robot to semantically reason about the location of an unseen target object among a set of previously seen objects in the environment using a Large Language Model (LLM). We leverage object-to-object semantic relationships to plan a path through the environment that will allow us to accurately and efficiently locate our target object while also reducing the overall distance traveled, without needing high-level room or area-level semantic relationships. During our evaluations, we found that our method outperformed a current state-of-the-art baseline and our ablations. Our offline testing yielded an average path efficiency of 84%, reflecting how closely the predicted path aligns with the ideal path. Upon deploying our system on the Boston Dynamics Spot robot in a real-world farm environment, we found that our system had a success rate of 80%, with a success weighted by path length of 0.67, which demonstrates a reasonable trade-off between task success and path efficiency under real-world conditions. The project website can be viewed at https://adi-balaji.github.io/losae/

Language-Guided Object Search in Agricultural Environments

TL;DR

This work tackles object search in loosely semantically organized agricultural environments by proposing LOSAE, a language-guided approach that reasons about unseen targets using only seen objects and an LLM. LOSAE performs environment exploration to build an object memory, uses an LLM to compute probabilistic location of the unseen target, and plans a waypoint-based path that balances distance with semantic affinity, optionally grasping the target. Real-world experiments on a Boston Dynamics Spot achieve a robust 80% success rate and a 0.67 SPL, with offline reasoning yielding about 84% path efficiency relative to an ideal path. The results demonstrate that language-based reasoning over object-to-object relationships can effectively guide search in unstructured farm settings, highlighting potential for scalable, low-cost deployment in agricultural robotics.

Abstract

Creating robots that can assist in farms and gardens can help reduce the mental and physical workload experienced by farm workers. We tackle the problem of object search in a farm environment, providing a method that allows a robot to semantically reason about the location of an unseen target object among a set of previously seen objects in the environment using a Large Language Model (LLM). We leverage object-to-object semantic relationships to plan a path through the environment that will allow us to accurately and efficiently locate our target object while also reducing the overall distance traveled, without needing high-level room or area-level semantic relationships. During our evaluations, we found that our method outperformed a current state-of-the-art baseline and our ablations. Our offline testing yielded an average path efficiency of 84%, reflecting how closely the predicted path aligns with the ideal path. Upon deploying our system on the Boston Dynamics Spot robot in a real-world farm environment, we found that our system had a success rate of 80%, with a success weighted by path length of 0.67, which demonstrates a reasonable trade-off between task success and path efficiency under real-world conditions. The project website can be viewed at https://adi-balaji.github.io/losae/

Paper Structure

This paper contains 15 sections, 4 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: LOSAE allows the robot to find an unseen object using previously seen objects as instruments of reasoning. Here, the robot can understand that a tool like a drill is most likely located near similar tools like a screwdriver, chisel or shovel
  • Figure 2: The robot is tasked with finding a target object $x_t$ based on a user query. The robot uses an LLM for semantic reasoning by generating a probability distribution $P$ based on the object-to-object relationships between a seen object $x_s$ in $X_s$ and the target object $x_t$. This distribution helps calculate waypoint scores $s(w_i)$ for each waypoint $w_i$. The robot plans a path through the waypoints according to the cost function $C$ that balances visiting high score waypoints and maintaining a short path length (see Methods for more details). The robot then navigates to a waypoint, inspects the objects around, and grasps the target if found; if not, it navigates to the next waypoint and continues the search.
  • Figure 3: Images from our custom YOLOv8 dataset showcasing the objects and environment we work with. Top left: farm cart, drill, and pliers by the field. Middle: watering can, hammer, water hose nozzle, and bolt cutters by the tool shelf and table in the tool storage. Bottom left: water tap, water pipe, hand rake and screwdriver by the water station. Bottom right: shovel and chisel.
  • Figure 4: Left: an instance of the robot correctly identifying the target object at the correct location. Right: an instance of a false positive. The robot correctly finds the watering can next to the water tap and water hose nozzle, but grasps the water tap due to perceptual errors.