Table of Contents
Fetching ...

ILCL: Inverse Logic-Constraint Learning from Temporally Constrained Demonstrations

Minwoo Cho, Jaehwi Jang, Daehyung Park

TL;DR

ILCL reframes temporal-constraint learning as a two-player zero-sum game between GA-TL-Mining and Logic-CRL to recover transferable TLTL constraints from demonstrations. GA-TL-Mining performs free-form TLTL syntax-tree mining while Logic-CRL trains policies that maximize rewards under the discovered TLTL constraints, using a PCMDP and a constraint-redistribution scheme to address non-Markovian, sparse evaluations. Across four simulated benchmarks and a real-world peg-in-shallow-hole transfer, ILCL yields lowest constraint-violation rates with expert-like rewards and demonstrates robust generalization to unseen environments, with ablations confirming the necessity of constraint redistribution. The approach advances interpretable, transferable temporal constraints for robotics, enabling constrained, high-reward behavior in diverse tasks and real-world settings.

Abstract

We aim to solve the problem of temporal-constraint learning from demonstrations to reproduce demonstration-like logic-constrained behaviors. Learning logic constraints is challenging due to the combinatorially large space of possible specifications and the ill-posed nature of non-Markovian constraints. To figure it out, we introduce a novel temporal-constraint learning method, which we call inverse logic-constraint learning (ILCL). Our method frames ICL as a two-player zero-sum game between 1) a genetic algorithm-based temporal-logic mining (GA-TL-Mining) and 2) logic-constrained reinforcement learning (Logic-CRL). GA-TL-Mining efficiently constructs syntax trees for parameterized truncated linear temporal logic (TLTL) without predefined templates. Subsequently, Logic-CRL finds a policy that maximizes task rewards under the constructed TLTL constraints via a novel constraint redistribution scheme. Our evaluations show ILCL outperforms state-of-the-art baselines in learning and transferring TL constraints on four temporally constrained tasks. We also demonstrate successful transfer to real-world peg-in-shallow-hole tasks.

ILCL: Inverse Logic-Constraint Learning from Temporally Constrained Demonstrations

TL;DR

ILCL reframes temporal-constraint learning as a two-player zero-sum game between GA-TL-Mining and Logic-CRL to recover transferable TLTL constraints from demonstrations. GA-TL-Mining performs free-form TLTL syntax-tree mining while Logic-CRL trains policies that maximize rewards under the discovered TLTL constraints, using a PCMDP and a constraint-redistribution scheme to address non-Markovian, sparse evaluations. Across four simulated benchmarks and a real-world peg-in-shallow-hole transfer, ILCL yields lowest constraint-violation rates with expert-like rewards and demonstrates robust generalization to unseen environments, with ablations confirming the necessity of constraint redistribution. The approach advances interpretable, transferable temporal constraints for robotics, enabling constrained, high-reward behavior in diverse tasks and real-world settings.

Abstract

We aim to solve the problem of temporal-constraint learning from demonstrations to reproduce demonstration-like logic-constrained behaviors. Learning logic constraints is challenging due to the combinatorially large space of possible specifications and the ill-posed nature of non-Markovian constraints. To figure it out, we introduce a novel temporal-constraint learning method, which we call inverse logic-constraint learning (ILCL). Our method frames ICL as a two-player zero-sum game between 1) a genetic algorithm-based temporal-logic mining (GA-TL-Mining) and 2) logic-constrained reinforcement learning (Logic-CRL). GA-TL-Mining efficiently constructs syntax trees for parameterized truncated linear temporal logic (TLTL) without predefined templates. Subsequently, Logic-CRL finds a policy that maximizes task rewards under the constructed TLTL constraints via a novel constraint redistribution scheme. Our evaluations show ILCL outperforms state-of-the-art baselines in learning and transferring TL constraints on four temporally constrained tasks. We also demonstrate successful transfer to real-world peg-in-shallow-hole tasks.

Paper Structure

This paper contains 17 sections, 13 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: An exemplar peg-in-shallow-hole task, where one corner of the peg must maintain contact with the hole during insertion using a parallel jaw gripper. By learning temporal logic constraints from demonstrations, our method successfully transfers the constrained insertion behavior to an unseen, tilted hole environment.
  • Figure 2: Illustration of ILCL that learns TLTL constraint from demonstration to generalize to novel scenario. ILCL consists of two parts: GA-TL-Mining (Middle top) which generates a TLTL constraint that distinguishes demonstration trajectories from generated trajectories through a genetic algorithm on the TL syntax tree; and Logic-CRL, which optimizes the policy for task rewards under the generated TL constraint using SAC-Lag ha2021learning. Once ILCL identifies a TL constraint, Logic-CRL derives a new policy applicable to novel scenarios.
  • Figure 3: The illustration of the operations in the offspring generation. For each pTLTL tree in the figure, the blue nodes represent logical operators, and the orange nodes represent pAP nodes. The labels of the pAP node indicate the semantics of pAP; for example, the label '2:<' refers to ${\bf s}^{(2)} < \theta$ for an undetermined parameter $\theta$.
  • Figure 4: Comparison of temporal constraint learning and transfer performance in four simulated tasks. In each task, ILCL, MTICL, and HierAIRL first learn temporal constraints from demonstrations (top), satisfying the ground-truth constraint $\phi$ (bottom). In novel environments, we train policies with the learned constraints or rewards to reproduce demonstration-like constrained behaviors. Left: in the navigation tasks, yellow shapes and black traces represent goals and trajectories, respectively. Middle: in the wiping task, green and red flags indicate start and goal locations, respectively. Red curve represents the observed contact force, and black-dot line represent the contact threshold from the demonstration. Right: in the peg-in-shallow-hole task, red, green, and gray objects represent a gripper, a peg, and a hole, respectively.
  • Figure 5: Comparison of the proposed ILCL and baseline methods in training and test environments. 'Expert' denotes the result of expert demonstrations without constraint violations. We normalize all REW values to the 'Expert' score, resulting in a range of $[0,1]$.
  • ...and 2 more figures