Table of Contents
Fetching ...

Continual learning and refinement of causal models through dynamic predicate invention

Enrique Crespo-Fernandez, Oliver Ray, Telmo de Menezes e Silva Filho, Peter Flach

TL;DR

This work proposes a framework for constructing symbolic causal world models entirely online by integrating continuous model learning and repair into the agent's decision loop, by leveraging the power of Meta-Interpretive Learning and predicate invention to find semantically meaningful and reusable abstractions.

Abstract

Efficiently navigating complex environments requires agents to internalize the underlying logic of their world, yet standard world modelling methods often struggle with sample inefficiency, lack of transparency, and poor scalability. We propose a framework for constructing symbolic causal world models entirely online by integrating continuous model learning and repair into the agent's decision loop, by leveraging the power of Meta-Interpretive Learning and predicate invention to find semantically meaningful and reusable abstractions, allowing an agent to construct a hierarchy of disentangled, high-quality concepts from its observations. We demonstrate that our lifted inference approach scales to domains with complex relational dynamics, where propositional methods suffer from combinatorial explosion, while achieving sample-efficiency orders of magnitude higher than the established PPO neural-network-based baseline.

Continual learning and refinement of causal models through dynamic predicate invention

TL;DR

This work proposes a framework for constructing symbolic causal world models entirely online by integrating continuous model learning and repair into the agent's decision loop, by leveraging the power of Meta-Interpretive Learning and predicate invention to find semantically meaningful and reusable abstractions.

Abstract

Efficiently navigating complex environments requires agents to internalize the underlying logic of their world, yet standard world modelling methods often struggle with sample inefficiency, lack of transparency, and poor scalability. We propose a framework for constructing symbolic causal world models entirely online by integrating continuous model learning and repair into the agent's decision loop, by leveraging the power of Meta-Interpretive Learning and predicate invention to find semantically meaningful and reusable abstractions, allowing an agent to construct a hierarchy of disentangled, high-quality concepts from its observations. We demonstrate that our lifted inference approach scales to domains with complex relational dynamics, where propositional methods suffer from combinatorial explosion, while achieving sample-efficiency orders of magnitude higher than the established PPO neural-network-based baseline.
Paper Structure (15 sections, 7 equations, 3 figures, 2 algorithms)

This paper contains 15 sections, 7 equations, 3 figures, 2 algorithms.

Figures (3)

  • Figure 1: Visualization of a selected set of rules from the learnt symbolic causal model on the MiniHack 'Lava Crossing' task. The environment is modelled as a hierarchy of interpretable concepts, transition rules, and physical constraints. (Left) State Interpretation: Colored overlays illustrate how the Learnt Abstractions (Right) ground to specific regions of the state space. The agent ($p4$, purple) senses its neighborhood ($p3$, cyan). The concept of "moving" ($p2$, orange) is reused in two rules: the one modelling movement ($p1$) and the one modelling death ($p5$). (Center) Dynamics & Constraints: The Learnt Dynamics use these high-level abstractions to predict state evolution. For example, the dying rule is triggered only when the abstract condition $p5$ (moving into lava) is met. Learnt Constraints enforce physical consistency, such as mutual exclusion ($\otimes$), ensuring the agent cannot be simultaneously alive and dead or occupy multiple coordinates.
  • Figure 2: Our system vs PPO on the 10$\times$10 grid version of the MiniHack 'Lava Crossing' task. (a) Our system converges to 43 clauses (28 abstractions, 13 dynamics, 2 constraints) by step 23. (b) Our system reaches the goal at episode 2 since then it is able to consistently navigate the environment to it relying on it learn model; PPO requires 129 episodes for first success and it is not until episode 300 that it starts to converge.
  • Figure 3: Lattice traversal dynamics. Prediction errors trigger Generalization (moving to $\top$) or Specialization (pruning to $H$).