Table of Contents
Fetching ...

Why AI systems don't learn and what to do about it: Lessons on autonomous learning from cognitive science

Emmanuel Dupoux, Yann LeCun, Jitendra Malik

Abstract

We critically examine the limitations of current AI models in achieving autonomous learning and propose a learning architecture inspired by human and animal cognition. The proposed framework integrates learning from observation (System A) and learning from active behavior (System B) while flexibly switching between these learning modes as a function of internally generated meta-control signals (System M). We discuss how this could be built by taking inspiration on how organisms adapt to real-world, dynamic environments across evolutionary and developmental timescales.

Why AI systems don't learn and what to do about it: Lessons on autonomous learning from cognitive science

Abstract

We critically examine the limitations of current AI models in achieving autonomous learning and propose a learning architecture inspired by human and animal cognition. The proposed framework integrates learning from observation (System A) and learning from active behavior (System B) while flexibly switching between these learning modes as a function of internally generated meta-control signals (System M). We discuss how this could be built by taking inspiration on how organisms adapt to real-world, dynamic environments across evolutionary and developmental timescales.
Paper Structure (22 sections, 3 equations, 5 figures, 4 tables)

This paper contains 22 sections, 3 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Standard machine learning (left). The machine does not learn by itself; it requires an assembly line of research engineers and data scientists collecting, formatting, and curating different kinds of data, each used to train successively different components of the model, each with specificly engineered loss and reward functions. The machine is then left with no ability to learn from its experience. Autonomous machine learning (right). The agent is learning directly in interaction with the world; the sources of data are generarated by the agent itself through different learning modes (learning by observation, by action, which can be extended to higher modes like learning by verbal interaction or self-play). Our proposed architecture include a meta controler enabling learning while operating in the real world. (Drawings from ChatGPT).
  • Figure 2: Summary of modes of interactions between Systems A and B. System A provides System B with predictions of future states conditioned on past states and actions, with hierarchical abstractions over possible actions, and a SSL loss that can be used for curiosity/exploration. System B through its action provides rich and task relevant input for System A to learn from.
  • Figure 3: Interactions between learning modes for imitation learning. (a) Self Play. System B provides action, state trajectories to System A that learns a World Model, and provides a prediction-based intrinsic reward signal to system B. (b) Social Observation. System B directs the attention to peers that provide System A with complex trajectories from which it infers latent actions. (c) Retargeted imitation. System A learns to map exocentric actions and states to egocentric ones, helping system B to achieve goal-directed behavior. (image from ChatGPT)
  • Figure 4: Blueprint of a cognitive architecture featuring System M as an autonomous orchestrator. System M acts as a central control plane that automates data routing and training recipes. High-bandwidth data streams (e.g., plain arrows) carry raw sensory inputs, motor commands, and latent representations between System A (perception/world modeling), System B (action/policy), and an episodic memory buffer. Low-bandwidth control streams (e.g., thin dashed arrows) carry telemetry: System M monitors internal meta-states (such as prediction errors or uncertainty) and outputs routing commands (meta-actions) to dynamically open or close data pathways, effectively assembling and disassembling learning and inference pipelines on the fly.
  • Figure 5: Evo/Devo framework for building autonomous learning agents. Learning takes place at two scales. In the developmental scale, the learner's architecture (A, B and M) is initialized from meta parameter $\phi$. A and B update their parameters through interaction with the environment controlled by a fixed controler M. In the the evolutionary scale, $phi$ is updated to optimize a fitness function $\mathcal{L}$ measured over the life cycle of the system. (images from ChatGPT).