Table of Contents
Fetching ...

InCoRo: In-Context Learning for Robotics Control with Feedback Loops

Jiaqiang Ye Zhu, Carla Gomez Cano, David Vazquez Bermudez, Michal Drozdzal

TL;DR

InCoRo addresses the challenge of robust robotic control in dynamic environments by integrating in-context learning with a classical feedback loop. The system uses a pre-processor to decompose user instructions into atomic actions and objects, and a control loop with an LLM controller, perception, and ROS2-based robots to adapt in real time. Empirical results on SCARA and DELTA arms show state-of-the-art static and dynamic performance, with large gains over the prior CaP baseline and clear evidence of robustness through ablations. This work demonstrates zero-shot generalization to new tasks and environments, advancing autonomous robotics toward adaptable, reliable operation in real-world settings.

Abstract

One of the challenges in robotics is to enable robotic units with the reasoning capability that would be robust enough to execute complex tasks in dynamic environments. Recent advances in LLMs have positioned them as go-to tools for simple reasoning tasks, motivating the pioneering work of Liang et al. [35] that uses an LLM to translate natural language commands into low-level static execution plans for robotic units. Using LLMs inside robotics systems brings their generalization to a new level, enabling zero-shot generalization to new tasks. This paper extends this prior work to dynamic environments. We propose InCoRo, a system that uses a classical robotic feedback loop composed of an LLM controller, a scene understanding unit, and a robot. Our system continuously analyzes the state of the environment and provides adapted execution commands, enabling the robot to adjust to changing environmental conditions and correcting for controller errors. Our system does not require any iterative optimization to learn to accomplish a task as it leverages in-context learning with an off-the-shelf LLM model. Through an extensive validation process involving two standardized industrial robotic units -- SCARA and DELTA types -- we contribute knowledge about these robots, not popular in the community, thereby enriching it. We highlight the generalization capabilities of our system and show that (1) in-context learning in combination with the current state-of-the-art LLMs is an effective way to implement a robotic controller; (2) in static environments, InCoRo surpasses the prior art in terms of the success rate; (3) in dynamic environments, we establish new state-of-the-art for the SCARA and DELTA units, respectively. This research paves the way towards building reliable, efficient, intelligent autonomous systems that adapt to dynamic environments.

InCoRo: In-Context Learning for Robotics Control with Feedback Loops

TL;DR

InCoRo addresses the challenge of robust robotic control in dynamic environments by integrating in-context learning with a classical feedback loop. The system uses a pre-processor to decompose user instructions into atomic actions and objects, and a control loop with an LLM controller, perception, and ROS2-based robots to adapt in real time. Empirical results on SCARA and DELTA arms show state-of-the-art static and dynamic performance, with large gains over the prior CaP baseline and clear evidence of robustness through ablations. This work demonstrates zero-shot generalization to new tasks and environments, advancing autonomous robotics toward adaptable, reliable operation in real-world settings.

Abstract

One of the challenges in robotics is to enable robotic units with the reasoning capability that would be robust enough to execute complex tasks in dynamic environments. Recent advances in LLMs have positioned them as go-to tools for simple reasoning tasks, motivating the pioneering work of Liang et al. [35] that uses an LLM to translate natural language commands into low-level static execution plans for robotic units. Using LLMs inside robotics systems brings their generalization to a new level, enabling zero-shot generalization to new tasks. This paper extends this prior work to dynamic environments. We propose InCoRo, a system that uses a classical robotic feedback loop composed of an LLM controller, a scene understanding unit, and a robot. Our system continuously analyzes the state of the environment and provides adapted execution commands, enabling the robot to adjust to changing environmental conditions and correcting for controller errors. Our system does not require any iterative optimization to learn to accomplish a task as it leverages in-context learning with an off-the-shelf LLM model. Through an extensive validation process involving two standardized industrial robotic units -- SCARA and DELTA types -- we contribute knowledge about these robots, not popular in the community, thereby enriching it. We highlight the generalization capabilities of our system and show that (1) in-context learning in combination with the current state-of-the-art LLMs is an effective way to implement a robotic controller; (2) in static environments, InCoRo surpasses the prior art in terms of the success rate; (3) in dynamic environments, we establish new state-of-the-art for the SCARA and DELTA units, respectively. This research paves the way towards building reliable, efficient, intelligent autonomous systems that adapt to dynamic environments.
Paper Structure (24 sections, 1 equation, 17 figures, 5 tables, 1 algorithm)

This paper contains 24 sections, 1 equation, 17 figures, 5 tables, 1 algorithm.

Figures (17)

  • Figure 1: Our system in action. The first two lines depict sequences of frames collected when the robot is executing two actions in dynamic conditions. The top row, includes moving user hand, while in the second row the bucket is outside of the camera's initial field of view. Our system achieves high success rate for these challenging tasks. The bottom row, depicts the main components of our system: user-provided input text, pre-processor responsible for decomposition of user prompt into atomic actions and objects, and a control loop equipped with a large language model, perception unit and a robot. The robot is displayed in the right-most image.
  • Figure 2: Pre-processing diagram. Our pre-processing unit leverages in-context learning to decompose user-provided text into a sequence of atomic actions and a list of objects.
  • Figure 3: Our control loop. The control loop inputs the list of atomic actions and a set of objects extracted from the user-defined textual descriptions by the pre-processor. The loop consists of three elements: (1) a Large Language Model (LLM) controller that takes as an input the atomic task together with the robot's states and scene description and outputs low-level robot control commands, (2) a robot that acts in the world, and (3) a scene understanding module that continuously process the images to provide the locations of the objects in the scene. The controller can process multiple feedback operations per second when solving the user-defined task.
  • Figure 4: Example of scene understanding input-output. The two pictures on the left show the detection of the objects in the image, and on the right, the segmentation of these objects. Next, the coordinates of the different objects are passed in the form of a bounding box and a simplified mask. In this example, the four edges of a polygon are shown for reasons of space and understanding.
  • Figure 5: Static setup. Visualization of the initial (Start) and the final (Stop) states for the DELTA robot and two tasks.
  • ...and 12 more figures