The Autonomy-Alignment Problem in Open-Ended Learning Robots: Formalising the Purpose Framework

Gianluca Baldassarre; Richard J. Duro; Emilio Cartoni; Mehdi Khamassi; Alejandro Romero; Vieri Giuliano Santucci

The Autonomy-Alignment Problem in Open-Ended Learning Robots: Formalising the Purpose Framework

Gianluca Baldassarre, Richard J. Duro, Emilio Cartoni, Mehdi Khamassi, Alejandro Romero, Vieri Giuliano Santucci

TL;DR

The paper tackles the autonomy–alignment problem for open‑ended learning robots by introducing a purpose‑centred framework that separates human aims from task‑level objectives and grounds them in domain‑specific goals. It provides a formal, three‑level model (human, robot, domain) with triangular alignment, a four‑part decomposition (arbitration, human–robot alignment, grounding, competence acquisition), and rigorous necessary/sufficient conditions for alignment, accompanied by illustrative scenarios. A taxonomy of robot purposes (needs vs missions; intrinsic vs extrinsic; homeostatic/instrumental) and a GC‑RL framing further structure how autonomous exploration can be steered toward human values. The framework lays a theoretical foundation for designing autonomous, safe, and human‑aligned robots capable of long‑horizon learning across unstructured environments, while identifying key avenues for algorithmic realization, ethical integration, and multi‑domain grounding in future work.

Abstract

The rapid advancement of artificial intelligence is enabling the development of increasingly autonomous robots capable of operating beyond engineered factory settings and into the unstructured environments of human life. This shift raises a critical autonomy-alignment problem: how to ensure that a robot's autonomous learning focuses on acquiring knowledge and behaviours that serve human practical objectives while remaining aligned with broader human values (e.g., safety and ethics). This problem remains largely underexplored and lacks a unifying conceptual and formal framework. Here, we address one of its most challenging instances of the problem: open-ended learning (OEL) robots, which autonomously acquire new knowledge and skills through interaction with the environment, guided by intrinsic motivations and self-generated goals. We propose a computational framework, introduced qualitatively and then formalised, to guide the design of OEL architectures that balance autonomy with human control. At its core is the novel concept of purpose, which specifies what humans (designers or users) want the robot to learn, do, or avoid, independently of specific task domains. The framework decomposes the autonomy-alignment problem into four tractable sub-problems: the alignment of robot purposes (hardwired or learnt) with human purposes; the arbitration between multiple purposes; the grounding of abstract purposes into domain-specific goals; and the acquisition of competence to achieve those goals. The framework supports formal definitions of alignment across multiple cases and proofs of necessary and sufficient conditions under which alignment holds. Illustrative hypothetical scenarios showcase the applicability of the framework for guiding the development of purpose-aligned autonomous robots.

The Autonomy-Alignment Problem in Open-Ended Learning Robots: Formalising the Purpose Framework

TL;DR

Abstract

Paper Structure (95 sections, 74 equations, 5 figures, 3 tables)

This paper contains 95 sections, 74 equations, 5 figures, 3 tables.

Introduction
Main issues addressed by the literature on alignment
Value specification and misalignment
Learning human preferences, and their inconsistency
Robustness and distributional shift
Interpretability and explainability
Corrigibility and error recovery
Scalable oversight
Multi-agent and social alignment
Ethical and legal compliance
Reward hacking and instrumental convergence
Long-term and open-ended behaviour
Limitations
Open-ended learning
Open-ended learning and its limitations
...and 80 more sections

Figures (5)

Figure 1: The figure shows a Venn diagram representing the space of possible desired (prescriptive), instrumental, and undesired (proscriptive) outcomes of robot actions, and highlights their relationship to robot autonomy ('autonomy zone' area).
Figure 2: Main elements of the purpose framework. The framework is structured into three levels. Level 1 involves humans (acting as a designer or a user), who possess domain-independent purposes and domain-dependent goals. Level 2 concerns the robot, endowed with domain-independent purposes, either hardwired (needs) or learnt (missions), and domain-dependent goals. Level 3 comprises the domains, each characterised by state goals corresponding to robot and human goals. A triangular alignment occurs when a human purpose and its corresponding human goal, and a robot purpose and its corresponding robot goal, converge on the same world state goal, indicating coherent alignment between human and robot objectives.
Figure 3: Illustrative example of key elements in the purpose framework. The framework is organised across three main levels: human, robot, and environment. Both a human prescriptive purpose and a proscriptive purpose are represented. Each robot possesses two purposes, integrated within a motivational space characterised by a composite utility function that assigns desirability to purpose points. Robot purposes are grounded in domain-specific goals. The robots can pursue the same purposes across two distinct domains.
Figure 4: Main elements of the purpose framework and associated symbols. Multiple domains are considered (here two are shown, in green). The human hosts several purpose spaces (one shown, in yellow), each abstracting observations and including a purpose and a utility gradient over its points. Each purpose corresponds to distinct human goals across domains, themselves subsets of observations, inheriting utility from the purpose. Human goals map to sets of states (state goals) in the domains. Multiple robots may serve human purposes (one shown). The robot hosts multiple purpose spaces, including missions, learnt purposes inheriting utility from related human purposes, and needs, hardwired purposes primitive hardwired utility. Robot purposes correspond to robot goals across domains. Dotted lines illustrate that for effective service, robot goals should align with human goals through state goals in the environment, thus grounding triangular alignment.
Figure 5: Illustrative scenario: user-driven adjustment of mission utilities and priorities. The robot has two purposes: a mission related to human proximity and a homeostatic need related to energy. The environment includes a human, a battery charger, and alternating day/night conditions. (A) Initial mission prioritises visiting the human during the day. The robot satisfies the mission but risks battery depletion and disturbs the user at night. (B) After observing misalignments, the user reconfigures the mission: its priority is lowered and a proscriptive purpose penalises nighttime proximity to the human. The robot learns to better balance both purposes.

The Autonomy-Alignment Problem in Open-Ended Learning Robots: Formalising the Purpose Framework

TL;DR

Abstract

The Autonomy-Alignment Problem in Open-Ended Learning Robots: Formalising the Purpose Framework

Authors

TL;DR

Abstract

Table of Contents

Figures (5)