Table of Contents
Fetching ...

Emergent Neural Automaton Policies: Learning Symbolic Structure from Visuomotor Trajectories

Yiyuan Pan, Xusheng Luo, Hanjiang Hu, Peiqi Yu, Changliu Liu

Abstract

Scaling robot learning to long-horizon tasks remains a formidable challenge. While end-to-end policies often lack the structural priors needed for effective long-term reasoning, traditional neuro-symbolic methods rely heavily on hand-crafted symbolic priors. To address the issue, we introduce ENAP (Emergent Neural Automaton Policy), a framework that allows a bi-level neuro-symbolic policy adaptively emerge from visuomotor demonstrations. Specifically, we first employ adaptive clustering and an extension of the L* algorithm to infer a Mealy state machine from visuomotor data, which serves as an interpretable high-level planner capturing latent task modes. Then, this discrete structure guides a low-level reactive residual network to learn precise continuous control via behavior cloning (BC). By explicitly modeling the task structure with discrete transitions and continuous residuals, ENAP achieves high sample efficiency and interpretability without requiring task-specific labels. Extensive experiments on complex manipulation and long-horizon tasks demonstrate that ENAP outperforms state-of-the-art (SoTA) end-to-end VLA policies by up to 27% in low-data regimes, while offering a structured representation of robotic intent (Fig. 1).

Emergent Neural Automaton Policies: Learning Symbolic Structure from Visuomotor Trajectories

Abstract

Scaling robot learning to long-horizon tasks remains a formidable challenge. While end-to-end policies often lack the structural priors needed for effective long-term reasoning, traditional neuro-symbolic methods rely heavily on hand-crafted symbolic priors. To address the issue, we introduce ENAP (Emergent Neural Automaton Policy), a framework that allows a bi-level neuro-symbolic policy adaptively emerge from visuomotor demonstrations. Specifically, we first employ adaptive clustering and an extension of the L* algorithm to infer a Mealy state machine from visuomotor data, which serves as an interpretable high-level planner capturing latent task modes. Then, this discrete structure guides a low-level reactive residual network to learn precise continuous control via behavior cloning (BC). By explicitly modeling the task structure with discrete transitions and continuous residuals, ENAP achieves high sample efficiency and interpretability without requiring task-specific labels. Extensive experiments on complex manipulation and long-horizon tasks demonstrate that ENAP outperforms state-of-the-art (SoTA) end-to-end VLA policies by up to 27% in low-data regimes, while offering a structured representation of robotic intent (Fig. 1).

Paper Structure

This paper contains 52 sections, 1 theorem, 12 equations, 22 figures, 12 tables, 2 algorithms.

Key Result

Proposition B.1

Let $h_1, h_2 \in \mathbb{R}^h$ be two normalized hidden state vectors. Let $\tilde{h}_1, \tilde{h}_2 \in \{\pm 1\}^d$ be their corresponding saturated discrete versions. Assume the RNN operates in an $\epsilon$-saturated regime, such that $\|h_i - \tilde{h}_i\|_2 \leq \epsilon$ for $i \in \{1, 2\}$ The two vectors correspond to the same discrete automaton state (i.e., $\tilde{h}_1 = \tilde{h}_2$)

Figures (22)

  • Figure 1: Unsupervised Discovery of Task Structures.ENAP successfully recovers task-specific logic, including cyclic dependencies (Top-Left), logical branching for multi-modal goals (Bottom), and crucially, autonomous failure recovery (Top-Right, dashed line), where the automaton learns to transition back to a previous state ($q_2 \to q_1$) to retry insertion upon error. Semantic meanings (e.g., 'Hold') are generated by GPT based on cluster representatives.
  • Figure 2: Overview of the ENAP Framework. Our approach unifies structure discovery and hierarchical control through a three-stage pipeline: (i) Symbol Abstraction: Continuous trajectories from demonstrations are encoded and clustered via HDBSCAN mcinnes2017hdbscan to discover a discrete alphabet $\Sigma$, mapping sensorimotor streams to symbolic sequences. (ii) Structure Extraction via Extended L$^*$: We introduce an extended $L^*$ algorithm that iteratively constructs a PMM by querying the dataset. The process maintains an observation table and enforces closedness (expanding nodes) and consistency (refining edge) constraints using an RNN-based history encoder. (iii) Bi-level Control Pipeline: The learned PMM serves as a high-level planner, providing a coarse action prior $a_{\text{base}}^t$. This is combined with a learned residual term $\Delta a_t\sim\pi_{\psi}(\cdot|q_t,o_t)$ to synthesize the final precise action $\hat{a}_t = a_{\text{base}}^t + \Delta a_t$.
  • Figure 3: Iterative Structure-Policy Co-Evolution. The training process alternates between augmenting dataset via clustering with learned encoder (Left) and policy learning under the given dataset (Right). A residual network refines the action prior from PMM and generate updated encoder for subsequent iterations.
  • Figure 4: Experiments in Both Simulation and Real-World. (Left) Complex manipulation; (Middle) TAMP; (Right) Real-world scenarios.
  • Figure 5: Data-scaling comparison onDualStackCube. ENAP remains robust in low-data regimes, while other methods degrade significantly.
  • ...and 17 more figures

Theorems & Definitions (3)

  • Definition 3.1: Probabilistic Mealy Machine
  • Proposition B.1: Identifiability via $\epsilon$-Saturated RNNs
  • proof