A Safe Preference Learning Approach for Personalization with Applications to Autonomous Vehicles

Ruya Karagulle; Nikos Arechiga; Andrew Best; Jonathan DeCastro; Necmiye Ozay

A Safe Preference Learning Approach for Personalization with Applications to Autonomous Vehicles

Ruya Karagulle, Nikos Arechiga, Andrew Best, Jonathan DeCastro, Necmiye Ozay

TL;DR

This letter introduces a preference learning method that ensures adherence to given specifications, with an application to autonomous vehicles that yields competitive results compared to existing preference learning methods in terms of capturing preferences and notably outperforms them when safety is considered.

Abstract

This work introduces a preference learning method that ensures adherence to given specifications, with an application to autonomous vehicles. Our approach incorporates the priority ordering of Signal Temporal Logic (STL) formulas describing traffic rules into a learning framework. By leveraging Parametric Weighted Signal Temporal Logic (PWSTL), we formulate the problem of safety-guaranteed preference learning based on pairwise comparisons and propose an approach to solve this learning problem. Our approach finds a feasible valuation for the weights of the given PWSTL formula such that, with these weights, preferred signals have weighted quantitative satisfaction measures greater than their non-preferred counterparts. The feasible valuation of weights given by our approach leads to a weighted STL formula that can be used in correct-and-custom-by-construction controller synthesis. We demonstrate the performance of our method with a pilot human subject study in two different simulated driving scenarios involving a stop sign and a pedestrian crossing. Our approach yields competitive results compared to existing preference learning methods in terms of capturing preferences and notably outperforms them when safety is considered.

A Safe Preference Learning Approach for Personalization with Applications to Autonomous Vehicles

TL;DR

Abstract

Paper Structure (14 sections, 5 theorems, 15 equations, 3 figures, 2 tables)

This paper contains 14 sections, 5 theorems, 15 equations, 3 figures, 2 tables.

Introduction
Literature Review
Preliminaries
Signal Temporal Logic (STL)
Weighted Signal Temporal Logic (WSTL)
Problem Statement and Solution Method
An Optimization Reformulation
Computational Approach
Gradient-based optimization
Random Sampling
Experiments
Baseline Methods
Comparison of Solution Approaches
Conclusion, Limitations, and Future Work

Key Result

Lemma 1

Let $\tilde{r}: \mathcal{S} \times \mathcal{F} \times \mathbb{T} \to \mathbb{R}$ be a quantitative semantics. For a WSTL formula $\phi$, let $\phi_{\rm STL}$ be the STL formula obtained by removing the weights in $\phi$. If $\text{sign}(\rho(s,\phi_{\rm STL},t)) = \text{sign}(\tilde{r}(s,\phi,t))$ f

Figures (3)

Figure 1: Syntax tree of $\phi$ of Example \ref{['exp:1']}.
Figure 2: Two scenarios that are used for experiments
Figure 3: Human subject study results for the two scenarios for four of the users. "STL" denotes the traditional (unweighted) robustness when it is used directly, "RS" denotes our method with random sampling, "GB" denotes our method with gradient-based optimization, "BT" denotes SGD with Bradley-Terry model, and "SVM" represents SVM classification.

Theorems & Definitions (10)

Example 1
Example 1
Lemma 1
Theorem 1
Definition 1: Preference Data
Lemma 2
Theorem 2
Example 1
Proposition 1
Remark 1

A Safe Preference Learning Approach for Personalization with Applications to Autonomous Vehicles

TL;DR

Abstract

A Safe Preference Learning Approach for Personalization with Applications to Autonomous Vehicles

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (10)