Table of Contents
Fetching ...

A Safe Preference Learning Approach for Personalization with Applications to Autonomous Vehicles

Ruya Karagulle, Nikos Arechiga, Andrew Best, Jonathan DeCastro, Necmiye Ozay

TL;DR

This letter introduces a preference learning method that ensures adherence to given specifications, with an application to autonomous vehicles that yields competitive results compared to existing preference learning methods in terms of capturing preferences and notably outperforms them when safety is considered.

Abstract

This work introduces a preference learning method that ensures adherence to given specifications, with an application to autonomous vehicles. Our approach incorporates the priority ordering of Signal Temporal Logic (STL) formulas describing traffic rules into a learning framework. By leveraging Parametric Weighted Signal Temporal Logic (PWSTL), we formulate the problem of safety-guaranteed preference learning based on pairwise comparisons and propose an approach to solve this learning problem. Our approach finds a feasible valuation for the weights of the given PWSTL formula such that, with these weights, preferred signals have weighted quantitative satisfaction measures greater than their non-preferred counterparts. The feasible valuation of weights given by our approach leads to a weighted STL formula that can be used in correct-and-custom-by-construction controller synthesis. We demonstrate the performance of our method with a pilot human subject study in two different simulated driving scenarios involving a stop sign and a pedestrian crossing. Our approach yields competitive results compared to existing preference learning methods in terms of capturing preferences and notably outperforms them when safety is considered.

A Safe Preference Learning Approach for Personalization with Applications to Autonomous Vehicles

TL;DR

This letter introduces a preference learning method that ensures adherence to given specifications, with an application to autonomous vehicles that yields competitive results compared to existing preference learning methods in terms of capturing preferences and notably outperforms them when safety is considered.

Abstract

This work introduces a preference learning method that ensures adherence to given specifications, with an application to autonomous vehicles. Our approach incorporates the priority ordering of Signal Temporal Logic (STL) formulas describing traffic rules into a learning framework. By leveraging Parametric Weighted Signal Temporal Logic (PWSTL), we formulate the problem of safety-guaranteed preference learning based on pairwise comparisons and propose an approach to solve this learning problem. Our approach finds a feasible valuation for the weights of the given PWSTL formula such that, with these weights, preferred signals have weighted quantitative satisfaction measures greater than their non-preferred counterparts. The feasible valuation of weights given by our approach leads to a weighted STL formula that can be used in correct-and-custom-by-construction controller synthesis. We demonstrate the performance of our method with a pilot human subject study in two different simulated driving scenarios involving a stop sign and a pedestrian crossing. Our approach yields competitive results compared to existing preference learning methods in terms of capturing preferences and notably outperforms them when safety is considered.
Paper Structure (14 sections, 5 theorems, 15 equations, 3 figures, 2 tables)

This paper contains 14 sections, 5 theorems, 15 equations, 3 figures, 2 tables.

Key Result

Lemma 1

Let $\tilde{r}: \mathcal{S} \times \mathcal{F} \times \mathbb{T} \to \mathbb{R}$ be a quantitative semantics. For a WSTL formula $\phi$, let $\phi_{\rm STL}$ be the STL formula obtained by removing the weights in $\phi$. If $\text{sign}(\rho(s,\phi_{\rm STL},t)) = \text{sign}(\tilde{r}(s,\phi,t))$ f

Figures (3)

  • Figure 1: Syntax tree of $\phi$ of Example \ref{['exp:1']}.
  • Figure 2: Two scenarios that are used for experiments
  • Figure 3: Human subject study results for the two scenarios for four of the users. "STL" denotes the traditional (unweighted) robustness when it is used directly, "RS" denotes our method with random sampling, "GB" denotes our method with gradient-based optimization, "BT" denotes SGD with Bradley-Terry model, and "SVM" represents SVM classification.

Theorems & Definitions (10)

  • Example 1
  • Example 1
  • Lemma 1
  • Theorem 1
  • Definition 1: Preference Data
  • Lemma 2
  • Theorem 2
  • Example 1
  • Proposition 1
  • Remark 1