Table of Contents
Fetching ...

Diverse Controllable Diffusion Policy with Signal Temporal Logic

Yue Meng, Chuchu fan

TL;DR

We address the problem of generating diverse, rule-compliant driving behaviors for realistic simulators by combining a parametric STL formulation with a diffusion-policy trained on augmented data. The pipeline calibrates STL parameters from real data, uses trajectory optimization to produce multiple outcomes per scene and driving mode, and learns a diffusion model conditioned on scene and STL parameters, with a RefineNet adding diversity while enforcing rules. On NuScenes, the approach achieves leading open-loop and closed-loop performance in diversity, STL satisfaction, and safety, while enabling controllable behavior via STL parameter changes; a human-robot case study demonstrates near-oracle trajectory distributions with substantial speedups. The method advances realistic agent modeling for autonomous driving and human-robot interaction, offering open-source tooling to facilitate simulators and evaluation pipelines.

Abstract

Generating realistic simulations is critical for autonomous system applications such as self-driving and human-robot interactions. However, driving simulators nowadays still have difficulty in generating controllable, diverse, and rule-compliant behaviors for road participants: Rule-based models cannot produce diverse behaviors and require careful tuning, whereas learning-based methods imitate the policy from data but are not designed to follow the rules explicitly. Besides, the real-world datasets are by nature "single-outcome", making the learning method hard to generate diverse behaviors. In this paper, we leverage Signal Temporal Logic (STL) and Diffusion Models to learn controllable, diverse, and rule-aware policy. We first calibrate the STL on the real-world data, then generate diverse synthetic data using trajectory optimization, and finally learn the rectified diffusion policy on the augmented dataset. We test on the NuScenes dataset and our approach can achieve the most diverse rule-compliant trajectories compared to other baselines, with a runtime 1/17X to the second-best approach. In the closed-loop testing, our approach reaches the highest diversity, rule satisfaction rate, and the least collision rate. Our method can generate varied characteristics conditional on different STL parameters in testing. A case study on human-robot encounter scenarios shows our approach can generate diverse and closed-to-oracle trajectories. The annotation tool, augmented dataset, and code are available at https://github.com/mengyuest/pSTL-diffusion-policy.

Diverse Controllable Diffusion Policy with Signal Temporal Logic

TL;DR

We address the problem of generating diverse, rule-compliant driving behaviors for realistic simulators by combining a parametric STL formulation with a diffusion-policy trained on augmented data. The pipeline calibrates STL parameters from real data, uses trajectory optimization to produce multiple outcomes per scene and driving mode, and learns a diffusion model conditioned on scene and STL parameters, with a RefineNet adding diversity while enforcing rules. On NuScenes, the approach achieves leading open-loop and closed-loop performance in diversity, STL satisfaction, and safety, while enabling controllable behavior via STL parameter changes; a human-robot case study demonstrates near-oracle trajectory distributions with substantial speedups. The method advances realistic agent modeling for autonomous driving and human-robot interaction, offering open-source tooling to facilitate simulators and evaluation pipelines.

Abstract

Generating realistic simulations is critical for autonomous system applications such as self-driving and human-robot interactions. However, driving simulators nowadays still have difficulty in generating controllable, diverse, and rule-compliant behaviors for road participants: Rule-based models cannot produce diverse behaviors and require careful tuning, whereas learning-based methods imitate the policy from data but are not designed to follow the rules explicitly. Besides, the real-world datasets are by nature "single-outcome", making the learning method hard to generate diverse behaviors. In this paper, we leverage Signal Temporal Logic (STL) and Diffusion Models to learn controllable, diverse, and rule-aware policy. We first calibrate the STL on the real-world data, then generate diverse synthetic data using trajectory optimization, and finally learn the rectified diffusion policy on the augmented dataset. We test on the NuScenes dataset and our approach can achieve the most diverse rule-compliant trajectories compared to other baselines, with a runtime 1/17X to the second-best approach. In the closed-loop testing, our approach reaches the highest diversity, rule satisfaction rate, and the least collision rate. Our method can generate varied characteristics conditional on different STL parameters in testing. A case study on human-robot encounter scenarios shows our approach can generate diverse and closed-to-oracle trajectories. The annotation tool, augmented dataset, and code are available at https://github.com/mengyuest/pSTL-diffusion-policy.

Paper Structure

This paper contains 18 sections, 9 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Learning framework. The neural encoder embeds the scene to a feature vector. The DDPM takes the feature vector, the STL parameters (indicating driving modes, speed limit, safe distance threshold, etc) and the Gaussian noise to generate trajectories. RefineNet takes the upstream trajectories and features and generates diverse and rule-compliant trajectories.
  • Figure 2: Open-loop visualizations (Green: "left-lane-change", red: "right-lane-change" and blue: "lane-keeping"). Our approach generates the closest to the Traj. Opt. solution and results in the largest trajectory coverage among all the learning methods.
  • Figure 3: Diverse behaviors due to varied STL parameters. When the speed limit is low, the agent waits until all vehicles pass the roundabout. When at the middle-speed limit, the agent joins the queue in the middle but yields to other vehicles at high speed. At the high-speed limit, the car joins the queue and keeps its place as traversing the roundabout.
  • Figure 4: Valid trajectories for the human-robot encounters. VAE and T.S. (TrafficSim) cannot capture diverse trajectories, whereas OursG (Ours+guidance) are close-to-oracle.