Introduction to Latent Variable Energy-Based Models: A Path Towards Autonomous Machine Intelligence
Anna Dawid, Yann LeCun
TL;DR
The notes argue that current ML struggles with data efficiency and robust world modeling, hindering human-like autonomous intelligence. They advocate latent-variable energy-based models and the hierarchical JEPA/H-JEPA framework as a scalable path to predictive world models and hierarchical planning, trained with regularized objectives rather than solely supervised or reinforcement learning signals. The paper surveys energy-based models, training strategies (contrastive and regularized), and classic examples, illustrating how JEPA/H-JEPA can handle multimodal data and uncertainty to enable autonomous decision-making. If realized, this approach could yield more sample-efficient, reasoning-capable systems with broad impact on autonomous driving, robotics, translation, and scientific modeling.
Abstract
Current automated systems have crucial limitations that need to be addressed before artificial intelligence can reach human-like levels and bring new technological revolutions. Among others, our societies still lack Level 5 self-driving cars, domestic robots, and virtual assistants that learn reliable world models, reason, and plan complex action sequences. In these notes, we summarize the main ideas behind the architecture of autonomous intelligence of the future proposed by Yann LeCun. In particular, we introduce energy-based and latent variable models and combine their advantages in the building block of LeCun's proposal, that is, in the hierarchical joint embedding predictive architecture (H-JEPA).
