Table of Contents
Fetching ...

Cross-Domain Demo-to-Code via Neurosymbolic Counterfactual Reasoning

Jooyoung Kim, Wonje Choi, Younguk Song, Honguk Woo

Abstract

Recent advances in Vision-Language Models (VLMs) have enabled video-instructed robotic programming, allowing agents to interpret video demonstrations and generate executable control code. We formulate video-instructed robotic programming as a cross-domain adaptation problem, where perceptual and physical differences between demonstration and deployment induce procedural mismatches. However, current VLMs lack the procedural understanding needed to reformulate causal dependencies and achieve task-compatible behavior under such domain shifts. We introduce NeSyCR, a neurosymbolic counterfactual reasoning framework that enables verifiable adaptation of task procedures, providing a reliable synthesis of code policies. NeSyCR abstracts video demonstrations into symbolic trajectories that capture the underlying task procedure. Given deployment observations, it derives counterfactual states that reveal cross-domain incompatibilities. By exploring the symbolic state space with verifiable checks, NeSyCR proposes procedural revisions that restore compatibility with the demonstrated procedure. NeSyCR achieves a 31.14% improvement in task success over the strongest baseline Statler, showing robust cross-domain adaptation across both simulated and real-world manipulation tasks.

Cross-Domain Demo-to-Code via Neurosymbolic Counterfactual Reasoning

Abstract

Recent advances in Vision-Language Models (VLMs) have enabled video-instructed robotic programming, allowing agents to interpret video demonstrations and generate executable control code. We formulate video-instructed robotic programming as a cross-domain adaptation problem, where perceptual and physical differences between demonstration and deployment induce procedural mismatches. However, current VLMs lack the procedural understanding needed to reformulate causal dependencies and achieve task-compatible behavior under such domain shifts. We introduce NeSyCR, a neurosymbolic counterfactual reasoning framework that enables verifiable adaptation of task procedures, providing a reliable synthesis of code policies. NeSyCR abstracts video demonstrations into symbolic trajectories that capture the underlying task procedure. Given deployment observations, it derives counterfactual states that reveal cross-domain incompatibilities. By exploring the symbolic state space with verifiable checks, NeSyCR proposes procedural revisions that restore compatibility with the demonstrated procedure. NeSyCR achieves a 31.14% improvement in task success over the strongest baseline Statler, showing robust cross-domain adaptation across both simulated and real-world manipulation tasks.
Paper Structure (78 sections, 11 equations, 26 figures, 16 tables, 2 algorithms)

This paper contains 78 sections, 11 equations, 26 figures, 16 tables, 2 algorithms.

Figures (26)

  • Figure 1: Overview of $\textsc{NeSyCR}$ in a drawer-organizing task scenario. (Left) Illustration of the domain gap between the demonstration and deployment. (Middle) Overview of $\textsc{NeSyCR}$ framework, which generates an adapted procedure via neurosymbolic counterfactual reasoning. (Right) Outcome of the adapted procedure, showing that $\textsc{NeSyCR}$ successfully executes the task via a grounded code policy.
  • Figure 2: Symbolic world model construction
  • Figure 3: Neurosymbolic counterfactual adaptation
  • Figure 4: Analysis on (a) domain gap and (b) task complexity gap
  • Figure 5: Visualization of a cross-domain demo-to-code task
  • ...and 21 more figures