Inductive Generalization in Reinforcement Learning from Specifications

Vignesh Subramanian; Rohit Kushwah; Subhajit Roy; Suguman Bansal

Inductive Generalization in Reinforcement Learning from Specifications

Vignesh Subramanian, Rohit Kushwah, Subhajit Roy, Suguman Bansal

TL;DR

This work tackles zero-shot generalization in reinforcement learning when tasks are connected by an inductive structure expressed through logical specifications. It introduces a policy generator that, given an inductive task family, outputs per-instance policies by exploiting inductive relations on the edges of a common abstract graph. The core technical contribution is learning an inductive relation on edge policies, parameterized as an $m$-degree $\kappa$-polynomial, and composing these into path policies with learned guards to navigate the task DAG. Empirically, GenRL demonstrates strong generalization across long-horizon tasks in both simple and complex dynamics, including robot pick-and-place and classical control benchmarks, outperforming baselines that learn a single policy across tasks. This approach has practical implications for scalable, reusable policy generation in robotics and control, enabling rapid adaptation to unseen but structurally similar tasks while highlighting avenues for future theoretical guarantees and scalability.

Abstract

We present a novel inductive generalization framework for RL from logical specifications. Many interesting tasks in RL environments have a natural inductive structure. These inductive tasks have similar overarching goals but they differ inductively in low-level predicates and distributions. We present a generalization procedure that leverages this inductive relationship to learn a higher-order function, a policy generator, that generates appropriately adapted policies for instances of an inductive task in a zero-shot manner. An evaluation of the proposed approach on a set of challenging control benchmarks demonstrates the promise of our framework in generalizing to unseen policies for long-horizon tasks.

Inductive Generalization in Reinforcement Learning from Specifications

TL;DR

-degree

-polynomial, and composing these into path policies with learned guards to navigate the task DAG. Empirically, GenRL demonstrates strong generalization across long-horizon tasks in both simple and complex dynamics, including robot pick-and-place and classical control benchmarks, outperforming baselines that learn a single policy across tasks. This approach has practical implications for scalable, reusable policy generation in robotics and control, enabling rapid adaptation to unseen but structurally similar tasks while highlighting avenues for future theoretical guarantees and scalability.

Abstract

Paper Structure (69 sections, 1 theorem, 17 equations, 27 figures, 7 tables, 3 algorithms)

This paper contains 69 sections, 1 theorem, 17 equations, 27 figures, 7 tables, 3 algorithms.

Introduction
Related Work.
Preliminaries
Markov Decision Process (MDP).
Spectrl Specification Language.
Abstract Graph.
Generalizable RL for Inductive Tasks
Inductive Tasks.
Generalizable RL for Inductive Tasks.
Inductive Learning of Policy Generator
Learning an Inductive Relation on Edges
Neural Policies.
Learning the Policy Generator
Algorithm
Empirical Evaluation
...and 54 more sections

Key Result

Lemma 3.2

For an inductive task $\mathsf{R}$, let $\mathcal{G}_i$ be the abstract graph of the specification of the $i$-th task instance $\mathsf{R}_i$. Then, all the $\mathcal{G}_i$s share a common DAG structure with the same initial and final vertices.

Figures (27)

Figure 1: Tower Destacking: The task is to pick boxes from Source and stack it on Target.
Figure 2: Choice: visit either $g_1$ or $g_2$, then visit goal; task instances differ in initial state distribution.
Figure 3: Moving initial and goal distributions
Figure 4: Moving initial distribution, goal stationary
Figure 5: Moving initial distribution with obstacle
...and 22 more figures

Theorems & Definitions (6)

Definition 2.1
Definition 3.1
Lemma 3.2
proof
Definition 3.3: Learning a Policy Generator
Definition C.1: jothimurugan2021compositional

Inductive Generalization in Reinforcement Learning from Specifications

TL;DR

Abstract

Inductive Generalization in Reinforcement Learning from Specifications

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (27)

Theorems & Definitions (6)