Table of Contents
Fetching ...

Focusing Robot Open-Ended Reinforcement Learning Through Users' Purposes

Emilio Cartoni, Gianluca Cioccolini, Gianluca Baldassarre

TL;DR

Open-ended learning robots risk unfocused exploration that can waste time on tasks irrelevant to users. The authors propose POEL, a purpose-directed extension that uses speech-to-text, scene analysis, and a large language model to identify objects relevant to a user-defined purpose and bias exploration and rewards toward those objects. Built on prior OEL frameworks (LEXA/ALAN), POEL introduces Proximity and Lifting purpose models and an integrated reward structure to steer skill acquisition toward purpose-related tasks, validated in a simulated camera-arm-gripper setup. The results show POEL outperforms state-of-the-art unfocused exploration on purpose-related tasks and can learn tasks that baseline systems struggle with, signaling a practical path to user-aligned autonomy in unstructured environments.

Abstract

Open-Ended Learning (OEL) autonomous robots can acquire new skills and knowledge through direct interaction with their environment, relying on mechanisms such as intrinsic motivations and self-generated goals to guide learning processes. OEL robots are highly relevant for applications as they can autonomously leverage acquired knowledge to perform tasks beneficial to human users in unstructured environments, addressing challenges unforeseen at design time. However, OEL robots face a significant limitation: their openness may lead them to waste time learning information that is irrelevant to tasks desired by specific users. Here, we propose a solution called `Purpose-Directed Open-Ended Learning' (POEL), based on the novel concept of `purpose' introduced in previous work. A purpose specifies what users want the robot to achieve. The key insight of this work is that purpose can focus OEL on learning self-generated classes of tasks that, while unknown during autonomous learning (as typical in OEL), involve objects relevant to the purpose. This concept is operationalised in a novel robot architecture capable of receiving a human purpose through speech-to-text, analysing the scene to identify objects, and using a Large Language Model to reason about which objects are purpose-relevant. These objects are then used to bias OEL exploration towards their spatial proximity and to self-generate rewards that favour interactions with them. The solution is tested in a simulated scenario where a camera-arm-gripper robot interacts freely with purpose-related and distractor objects. For the first time, the results demonstrate the potential advantages of purpose-focused OEL over state-of-the-art OEL methods, enabling robots to handle unstructured environments while steering their learning toward knowledge acquisition relevant to users.

Focusing Robot Open-Ended Reinforcement Learning Through Users' Purposes

TL;DR

Open-ended learning robots risk unfocused exploration that can waste time on tasks irrelevant to users. The authors propose POEL, a purpose-directed extension that uses speech-to-text, scene analysis, and a large language model to identify objects relevant to a user-defined purpose and bias exploration and rewards toward those objects. Built on prior OEL frameworks (LEXA/ALAN), POEL introduces Proximity and Lifting purpose models and an integrated reward structure to steer skill acquisition toward purpose-related tasks, validated in a simulated camera-arm-gripper setup. The results show POEL outperforms state-of-the-art unfocused exploration on purpose-related tasks and can learn tasks that baseline systems struggle with, signaling a practical path to user-aligned autonomy in unstructured environments.

Abstract

Open-Ended Learning (OEL) autonomous robots can acquire new skills and knowledge through direct interaction with their environment, relying on mechanisms such as intrinsic motivations and self-generated goals to guide learning processes. OEL robots are highly relevant for applications as they can autonomously leverage acquired knowledge to perform tasks beneficial to human users in unstructured environments, addressing challenges unforeseen at design time. However, OEL robots face a significant limitation: their openness may lead them to waste time learning information that is irrelevant to tasks desired by specific users. Here, we propose a solution called `Purpose-Directed Open-Ended Learning' (POEL), based on the novel concept of `purpose' introduced in previous work. A purpose specifies what users want the robot to achieve. The key insight of this work is that purpose can focus OEL on learning self-generated classes of tasks that, while unknown during autonomous learning (as typical in OEL), involve objects relevant to the purpose. This concept is operationalised in a novel robot architecture capable of receiving a human purpose through speech-to-text, analysing the scene to identify objects, and using a Large Language Model to reason about which objects are purpose-relevant. These objects are then used to bias OEL exploration towards their spatial proximity and to self-generate rewards that favour interactions with them. The solution is tested in a simulated scenario where a camera-arm-gripper robot interacts freely with purpose-related and distractor objects. For the first time, the results demonstrate the potential advantages of purpose-focused OEL over state-of-the-art OEL methods, enabling robots to handle unstructured environments while steering their learning toward knowledge acquisition relevant to users.

Paper Structure

This paper contains 9 sections, 1 equation, 2 figures.

Figures (2)

  • Figure 1: Left: the POEL architecture. Right: The environment comprises a robotic arm with gripper, two boxes and three objects. Either the blue or the green object are Purpose-related while the red object is always a distractor. From left to right: a "reach blue" goal, a "push green" goal and a "pick and place blue" goal.
  • Figure 2: Performance comparison for different types of goals: reaching, pushing, and pick and place. The pushing and pick-and-place results show averages across the 4 goals for each cube. Mean performance and standard error of 3 training runs per condition, with the exception of LEXA baseline that involved 9 runs.