Table of Contents
Fetching ...

Knowledge, Rules and Their Embeddings: Two Paths towards Neuro-Symbolic JEPA

Yongchao Huang, Hassan Raza

Abstract

Modern self-supervised predictive architectures excel at capturing complex statistical correlations from high-dimensional data but lack mechanisms to internalize verifiable human logic, leaving them susceptible to spurious correlations and shortcut learning. Conversely, traditional rule-based inference systems offer rigorous, interpretable logic but suffer from discrete boundaries and NP-hard combinatorial explosion. To bridge this divide, we propose a bidirectional neuro-symbolic framework centered around Rule-informed Joint-Embedding Predictive Architectures (RiJEPA). In the first direction, we inject structured inductive biases into JEPA training via Energy-Based Constraints (EBC) and a multi-modal dual-encoder architecture. This fundamentally reshapes the representation manifold, replacing arbitrary statistical correlations with geometrically sound logical basins. In the second direction, we demonstrate that by relaxing rigid, discrete symbolic rules into a continuous, differentiable logic, we can bypass traditional combinatorial search for new rule generation. By leveraging gradient-guided Langevin diffusion within the rule energy landscape, we introduce novel paradigms for continuous rule discovery, which enable unconditional joint generation, conditional forward and abductive inference, and marginal predictive translation. Empirical evaluations on both synthetic topological simulations and a high-stakes clinical use case confirm the efficacy of our approach. Ultimately, this framework establishes a powerful foundation for robust, generative, and interpretable neuro-symbolic representation learning.

Knowledge, Rules and Their Embeddings: Two Paths towards Neuro-Symbolic JEPA

Abstract

Modern self-supervised predictive architectures excel at capturing complex statistical correlations from high-dimensional data but lack mechanisms to internalize verifiable human logic, leaving them susceptible to spurious correlations and shortcut learning. Conversely, traditional rule-based inference systems offer rigorous, interpretable logic but suffer from discrete boundaries and NP-hard combinatorial explosion. To bridge this divide, we propose a bidirectional neuro-symbolic framework centered around Rule-informed Joint-Embedding Predictive Architectures (RiJEPA). In the first direction, we inject structured inductive biases into JEPA training via Energy-Based Constraints (EBC) and a multi-modal dual-encoder architecture. This fundamentally reshapes the representation manifold, replacing arbitrary statistical correlations with geometrically sound logical basins. In the second direction, we demonstrate that by relaxing rigid, discrete symbolic rules into a continuous, differentiable logic, we can bypass traditional combinatorial search for new rule generation. By leveraging gradient-guided Langevin diffusion within the rule energy landscape, we introduce novel paradigms for continuous rule discovery, which enable unconditional joint generation, conditional forward and abductive inference, and marginal predictive translation. Empirical evaluations on both synthetic topological simulations and a high-stakes clinical use case confirm the efficacy of our approach. Ultimately, this framework establishes a powerful foundation for robust, generative, and interpretable neuro-symbolic representation learning.
Paper Structure (65 sections, 19 equations, 9 figures, 5 tables)

This paper contains 65 sections, 19 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: The three-step pipeline for Rule-Based JEPA (RbJEPA), progressing from raw rule extraction to structured encoding and finally to representation learning via predictive loss minimization.
  • Figure 2: The Multi-Modal Dual-Encoder Architecture bridging continuous data and discrete logic. Continuous data contexts ($x$) and symbolic rule antecedents ($A$) are encoded on the left and converge on a universal predictor $g$. The predictions are compared against target encodings of data ($y$) and consequents ($C$) generated on the right, enabling joint minimization of data loss ($\mathcal{L}_{JEPA}$) and rule energy constraints ($\mathcal{L}_{EBC}$). SG denotes stop gradient.
  • Figure 3: Geometric illustration of the Energy-Based Constraints (EBC). Valid rule antecedents (gray boxes) are pulled into stable, low-energy basins corresponding to their correct consequents. Corrupted rule pairings are actively repelled into high-energy regions via the contrastive margin.
  • Figure 4: Energy landscapes of Classic JEPA and RiJEPA over the feature space. While the baseline learns an unstructured, flat functional mapping, RiJEPA carves deep, logically valid energy basins while actively repelling negative OOD rules.
  • Figure 5: Mean energy assignment by latent region. RiJEPA successfully applies a massive energy penalty to invalid rules outside the data distribution.
  • ...and 4 more figures