Table of Contents
Fetching ...

Agential AI for Integrated Continual Learning, Deliberative Behavior, and Comprehensible Models

Zeki Doruk Erden, Boi Faltings

TL;DR

The paper addresses fundamental limitations of gradient-based ML—namely continual learning, incomprehensibility, and poor integration with deliberative planning—by introducing Agential AI (AAI), a unified framework consisting of Modelleyen (varsel, non-gradient structure learning), Planlayan (goal-directed planning on a learned model), and Behavior Encapsulation (automatic hierarchical decomposition of learned plans). It demonstrates, in a proof-of-principle on a simple FSM environment, that the system can learn continually without destructive adaptation, perform planning on a learned model without reward-driven relearning, and produce a human-comprehensible, hierarchical representation of behavior. The approach leverages discrete state modeling with BSVs, DSVs, and CSVs, and introduces concepts like upstream conditioning and normalized causal effect (NCE) to manage complexity and significance of learned relations. Together, these components aim to deliver integrated, controllable AI that reasons over structured environment models and produces interpretable subpolicies, with future work targeting continuous spaces, higher-order conditioning, and live integration of behavior encapsulation.

Abstract

Contemporary machine learning paradigm excels in statistical data analysis, solving problems that classical AI couldn't. However, it faces key limitations, such as a lack of integration with planning, incomprehensible internal structure, and inability to learn continually. We present the initial design for an AI system, Agential AI (AAI), in principle operating independently or on top of statistical methods, designed to overcome these issues. AAI's core is a learning method that models temporal dynamics with guarantees of completeness, minimality, and continual learning, using component-level variation and selection to learn the structure of the environment. It integrates this with a behavior algorithm that plans on a learned model and encapsulates high-level behavior patterns. Preliminary experiments on a simple environment show AAI's effectiveness and potential.

Agential AI for Integrated Continual Learning, Deliberative Behavior, and Comprehensible Models

TL;DR

The paper addresses fundamental limitations of gradient-based ML—namely continual learning, incomprehensibility, and poor integration with deliberative planning—by introducing Agential AI (AAI), a unified framework consisting of Modelleyen (varsel, non-gradient structure learning), Planlayan (goal-directed planning on a learned model), and Behavior Encapsulation (automatic hierarchical decomposition of learned plans). It demonstrates, in a proof-of-principle on a simple FSM environment, that the system can learn continually without destructive adaptation, perform planning on a learned model without reward-driven relearning, and produce a human-comprehensible, hierarchical representation of behavior. The approach leverages discrete state modeling with BSVs, DSVs, and CSVs, and introduces concepts like upstream conditioning and normalized causal effect (NCE) to manage complexity and significance of learned relations. Together, these components aim to deliver integrated, controllable AI that reasons over structured environment models and produces interpretable subpolicies, with future work targeting continuous spaces, higher-order conditioning, and live integration of behavior encapsulation.

Abstract

Contemporary machine learning paradigm excels in statistical data analysis, solving problems that classical AI couldn't. However, it faces key limitations, such as a lack of integration with planning, incomprehensible internal structure, and inability to learn continually. We present the initial design for an AI system, Agential AI (AAI), in principle operating independently or on top of statistical methods, designed to overcome these issues. AAI's core is a learning method that models temporal dynamics with guarantees of completeness, minimality, and continual learning, using component-level variation and selection to learn the structure of the environment. It integrates this with a behavior algorithm that plans on a learned model and encapsulates high-level behavior patterns. Preliminary experiments on a simple environment show AAI's effectiveness and potential.

Paper Structure

This paper contains 29 sections, 1 theorem, 3 equations, 11 figures, 3 tables, 3 algorithms.

Key Result

Theorem 1

Let $y_i$ be an instance that includes the previous states of all the positive and negative sources of a CSV $C$ and the current states of all its conditioning targets. Then, if $C$ undergoes any modification as a result of encounter with an instance $y_1$, its state in reponse to any past instance

Figures (11)

  • Figure 1: Illustration of SV types and relationships. The figure shows BSVs ($B_i$), their DSVs for activation (A) and deactivation (D), and CSVs ($C_i$). Here, CSV $C_0$ takes as positive source BSV $B_0$, as negative source the activation DSV of $B_1$; and conditions the CSV $C_1$ as well as the deactivation of $B_2$, modelling "$B_2$ is deactivated and $C_1$ is active if $B_0$ is active and $B_1$ is not activated."
  • Figure 2: Sample formation of a CSV in a continual manner. The relationship to be modelled is $Y = X0\ and\ !X2$ ("!" denotes "not"). Black and orange arrows represent positive and negative sources for CSV $C0$ respctively. $Xi$ can be interpreted either as single or grouped SVs. (a) Initial state with no relation formed between $X0-3$ and $Y$. (b) $X0, X1 \rightarrow Y$ observed. Positive connections hypothesizing both $X0$ & $X1$ are required for Y are formed. (c) $X0 \rightarrow Y$ is observed. $X1$ is deduced unnecessary for $Y$. (d) $X0, X2, X3 \rightarrow !Y$ observed. $Y$ is hypothesized to be suppressed by $X2$ and $X3$. (e) $X0, X2 \rightarrow !Y$ observed. $X3$, seen unnecessary for suppression of $Y$, refined. Correct structure learned and is stable from now on.
  • Figure 3: Example of upstream conditioning, continuing from Figure \ref{['fig:csvform']}. Assume that the unconditionality flag of $C0$ is set following an observation that $(X0,\ !X2)$ did not result in its activation (see main text). (a) $X0, !X2, X4, X5 \rightarrow Y$ observed. $C0$ is observed to be active, since $XO, !X2$ led to $Y$. A new CSV $C1$ is formed & conditions $C0$. Note that $(X4, X5)$ alone will not predict activation of $C0$ if $C0$’s sources are not also active. (b) New conditioners are also subject to the CSV processes: Here, the source $X5$ of $C1$ has been refined, and new conditioners $C2$ and $C3$ are formed. Multiple conditioners represent alternative paths: In this case, $C0$ is expected to be active when sources of either $C1$ or $C2$ is active. Any logical function can hence be incorporated in a conditioning pathway in a minimal and ongoing manner without destroying past knowledge.
  • Figure 4: Illustration step-by-step upstream generation of action network, operating on different SV types. BX, CX and GX stand for BSV, CSV and GSV nodes respectively, (A) for activation, (0) for nonactive state. Black arrows are positive sources and precondition targets, green arrows are constituent (dashed) and constituency (solid) relations. The node that is extended at each step is highlighted in red. (a) Step 1. CSV C0 is opened. For CSVs, their upstream conditioners (C1) and sources are expanded (G0, B0(A)). (b) Steps 2-4. Each step opens up one of the sources of previous step. For GSVs (G0), constituents (B2, B3), constituencies (G1) and precondition events (G0(A)) are opened. For DSVs (B0(A)), their precondition states (B0(0)) and their conditioners (C2) are opened. Possible interrelations (e.g. B2 for C1, G0) do not need reopening if they already exist.
  • Figure 5: Illustrative example for the aim of behavior encapsulation process. To the left are two action networks (ANs) that represent two alternative pathways, split from the unified AN generated by Planner (node names are placeholders and can be of any SV type and target effect). We want to encapsulate the pathways between X and Z. For that; all pathways that are reliably present in all (here, both) networks are identified and a new encapsulated AN (EAN) is formed with them (right). Each encapsulated edge (dashed) in EAN includes copies of subnetworks that corresponded to this pathway in the original AN variants; which can be further encapsulated in subgroups via a recursive call (for example, edge (D0,Y) would include two pathways; first one formed only of E0, the second of C2 and E1). The EAN on right can be regarded as the subpolicy for realization of Z from X.
  • ...and 6 more figures

Theorems & Definitions (4)

  • Definition 1
  • Definition 2
  • Definition 3
  • Theorem 1