Table of Contents
Fetching ...

A representational framework for learning and encoding structurally enriched trajectories in complex agent environments

Corina Catarau-Cotutiu, Esther Mondragon, Eduardo Alonso

TL;DR

SETLE introduces Structurally Enriched Trajectories (SETs) and a Hierarchical Memory Graph to represent task execution across objects, interactions, affordances, and states. By encoding SETs with a heterogeneous graph encoder inspired by HeCo and training via contrastive and triplet losses, SETLE learns embeddings that capture cross-episode structure and transfer across CREATE and MiniGrid. Integrating SETLE into a reinforcement learning loop with memory retrieval, adapters, penalties, and soft updates yields improved sample efficiency, stability, and generalisation in both physically grounded and symbolic tasks, despite challenges in sparse-reward settings. The work suggests structured trajectory representations as a path toward lifelong, transferable intelligence that leverages cross-task commonalities beyond perceptual similarity.

Abstract

The ability of artificial intelligence agents to make optimal decisions and generalise them to different domains and tasks is compromised in complex scenarios. One way to address this issue has focused on learning efficient representations of the world and on how the actions of agents affect them in state-action transitions. Whereas such representations are procedurally efficient, they lack structural richness. To address this problem, we propose to enhance the agent's ontology and extend the traditional conceptualisation of trajectories to provide a more nuanced view of task execution. Structurally Enriched Trajectories (SETs) extend the encoding of sequences of states and their transitions by incorporating hierarchical relations between objects, interactions, and affordances. SETs are built as multi-level graphs, providing a detailed representation of the agent dynamics and a transferable functional abstraction of the task. SETs are integrated into an architecture, Structurally Enriched Trajectory Learning and Encoding (SETLE), that employs a heterogeneous graph-based memory structure of multi-level relational dependencies essential for generalisation. We demonstrate that SETLE can support downstream tasks, enabling agents to recognise task relevant structural patterns across CREATE and MiniGrid environments. Finally, we integrate SETLE with reinforcement learning and show measurable improvements in downstream performance, including breakthrough success rates in complex, sparse-reward tasks.

A representational framework for learning and encoding structurally enriched trajectories in complex agent environments

TL;DR

SETLE introduces Structurally Enriched Trajectories (SETs) and a Hierarchical Memory Graph to represent task execution across objects, interactions, affordances, and states. By encoding SETs with a heterogeneous graph encoder inspired by HeCo and training via contrastive and triplet losses, SETLE learns embeddings that capture cross-episode structure and transfer across CREATE and MiniGrid. Integrating SETLE into a reinforcement learning loop with memory retrieval, adapters, penalties, and soft updates yields improved sample efficiency, stability, and generalisation in both physically grounded and symbolic tasks, despite challenges in sparse-reward settings. The work suggests structured trajectory representations as a path toward lifelong, transferable intelligence that leverages cross-task commonalities beyond perceptual similarity.

Abstract

The ability of artificial intelligence agents to make optimal decisions and generalise them to different domains and tasks is compromised in complex scenarios. One way to address this issue has focused on learning efficient representations of the world and on how the actions of agents affect them in state-action transitions. Whereas such representations are procedurally efficient, they lack structural richness. To address this problem, we propose to enhance the agent's ontology and extend the traditional conceptualisation of trajectories to provide a more nuanced view of task execution. Structurally Enriched Trajectories (SETs) extend the encoding of sequences of states and their transitions by incorporating hierarchical relations between objects, interactions, and affordances. SETs are built as multi-level graphs, providing a detailed representation of the agent dynamics and a transferable functional abstraction of the task. SETs are integrated into an architecture, Structurally Enriched Trajectory Learning and Encoding (SETLE), that employs a heterogeneous graph-based memory structure of multi-level relational dependencies essential for generalisation. We demonstrate that SETLE can support downstream tasks, enabling agents to recognise task relevant structural patterns across CREATE and MiniGrid environments. Finally, we integrate SETLE with reinforcement learning and show measurable improvements in downstream performance, including breakthrough success rates in complex, sparse-reward tasks.

Paper Structure

This paper contains 62 sections, 13 equations, 34 figures, 3 tables, 3 algorithms.

Figures (34)

  • Figure 1: An example of the Hierarchical Structure of a SET Graph, illustrating Multi-Level Relational Dependencies. At the lowest level, interactions (light-filled circular nodes) depend on objects (darker circular nodes). Moving upward, affordances (smaller darker nodes) emerge from interactions, which then influence states (highlighted in red with distinct borders) which in turn influence the next affordance. At the highest level, the SET node connects to the structure through temporal state dependencies. The edges indicate relationships such as object dependencies, contributions, and effects. Varying line styles help differentiate these relationships. This hierarchical structure showcases how task execution is represented by linking objects, interactions, affordances, and states.
  • Figure 2: Hierarchical Memory building: The image shows the models used to build the hierarchical memory. The base layer extracts objects and interactions using the SAM object detection model and a pre-trained ConvLSTM, capturing fundamental task dynamics. The mid-layer represents states and affordances, modelling how interactions influence future states. At the highest level, the SET layer encodes trajectory-level abstractions, capturing long-term dependencies across sequential states. This hierarchical organisation enables efficient reasoning over task structures and adaptive decision-making.
  • Figure 3: Representation of two objects interacting. Interactions are characterized not only by their execution , but also by the relationships they share with the objects involved. On the right, the different parameters of an Interaction are shown such as the type, and the vector representation. Representation obtained from the Neo4j lal2015neo4j graph database interface.
  • Figure 5: Illustration of the hierarchical structure in SETLE representing the second level of the hierarchy that consists of states (red, from the middle node the first set of connections) connected by affordances (green, the outer-most nodes), illustrating the transitions between states via affordances.
  • Figure 6: Overview of the SETLE encoder training pipeline. Stage 1: A triplet of SETs (an anchor, a positive, and a negative sample) is used as input. Stage 2: All three subgraphs are processed by an encoder with shared parameters (f(x; $\theta$)), which uses the HeCo architecture to generate embeddings from both the Network Schema and Meta-path views. These views are regularized by an internal cross-view contrastive loss. Stage 3: The final embeddings are used to compute the Triplet Loss, which pulls the anchor and positive samples together in the latent space while pushing the negative sample away.
  • ...and 29 more figures