Table of Contents
Fetching ...

A Framework for Learning Scoring Rules in Autonomous Driving Planning Systems

Zikang Xiong, Joe Kurian Eappen, Suresh Jagannathan

TL;DR

This work presents FLoRA, a framework that learns interpretable scoring rules for autonomous driving by representing rules as differentiable temporal-logic formulas. A Scoring Logic Network with Temporal, Propositional, and Aggregation layers learns both the rule structure and predicate thresholds from positive-driving demonstrations in NuPlan, using a regularization scheme to avoid shortcuts. The learned rules are extracted into human-readable condition-action pairs and evaluated in NuPlan closed-loop simulations, where SLN outperforms expert-crafted rules and neural critics across multiple proposers while preserving interpretability. The framework acts as a plug-in module that can enhance safety, reliability, and transparency in motion planning, with potential extensions to predicate discovery and accident-based supervision.

Abstract

In autonomous driving systems, motion planning is commonly implemented as a two-stage process: first, a trajectory proposer generates multiple candidate trajectories, then a scoring mechanism selects the most suitable trajectory for execution. For this critical selection stage, rule-based scoring mechanisms are particularly appealing as they can explicitly encode driving preferences, safety constraints, and traffic regulations in a formalized, human-understandable format. However, manually crafting these scoring rules presents significant challenges: the rules often contain complex interdependencies, require careful parameter tuning, and may not fully capture the nuances present in real-world driving data. This work introduces FLoRA, a novel framework that bridges this gap by learning interpretable scoring rules represented in temporal logic. Our method features a learnable logic structure that captures nuanced relationships across diverse driving scenarios, optimizing both rules and parameters directly from real-world driving demonstrations collected in NuPlan. Our approach effectively learns to evaluate driving behavior even though the training data only contains positive examples (successful driving demonstrations). Evaluations in closed-loop planning simulations demonstrate that our learned scoring rules outperform existing techniques, including expert-designed rules and neural network scoring models, while maintaining interpretability. This work introduces a data-driven approach to enhance the scoring mechanism in autonomous driving systems, designed as a plug-in module to seamlessly integrate with various trajectory proposers. Our video and code are available on xiong.zikang.me/FLoRA.

A Framework for Learning Scoring Rules in Autonomous Driving Planning Systems

TL;DR

This work presents FLoRA, a framework that learns interpretable scoring rules for autonomous driving by representing rules as differentiable temporal-logic formulas. A Scoring Logic Network with Temporal, Propositional, and Aggregation layers learns both the rule structure and predicate thresholds from positive-driving demonstrations in NuPlan, using a regularization scheme to avoid shortcuts. The learned rules are extracted into human-readable condition-action pairs and evaluated in NuPlan closed-loop simulations, where SLN outperforms expert-crafted rules and neural critics across multiple proposers while preserving interpretability. The framework acts as a plug-in module that can enhance safety, reliability, and transparency in motion planning, with potential extensions to predicate discovery and accident-based supervision.

Abstract

In autonomous driving systems, motion planning is commonly implemented as a two-stage process: first, a trajectory proposer generates multiple candidate trajectories, then a scoring mechanism selects the most suitable trajectory for execution. For this critical selection stage, rule-based scoring mechanisms are particularly appealing as they can explicitly encode driving preferences, safety constraints, and traffic regulations in a formalized, human-understandable format. However, manually crafting these scoring rules presents significant challenges: the rules often contain complex interdependencies, require careful parameter tuning, and may not fully capture the nuances present in real-world driving data. This work introduces FLoRA, a novel framework that bridges this gap by learning interpretable scoring rules represented in temporal logic. Our method features a learnable logic structure that captures nuanced relationships across diverse driving scenarios, optimizing both rules and parameters directly from real-world driving demonstrations collected in NuPlan. Our approach effectively learns to evaluate driving behavior even though the training data only contains positive examples (successful driving demonstrations). Evaluations in closed-loop planning simulations demonstrate that our learned scoring rules outperform existing techniques, including expert-designed rules and neural network scoring models, while maintaining interpretability. This work introduces a data-driven approach to enhance the scoring mechanism in autonomous driving systems, designed as a plug-in module to seamlessly integrate with various trajectory proposers. Our video and code are available on xiong.zikang.me/FLoRA.

Paper Structure

This paper contains 28 sections, 1 theorem, 14 equations, 4 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

Aggregating the output from a single Propositional layer can represent any formula in the form of eq:condition_action_pair.

Figures (4)

  • Figure 1: This figure illustrates our framework for scoring and selecting trajectories in autonomous driving systems. Modern autonomous driving planners typically follow a propose-selection paradigm, where multiple candidate trajectories are first generated and then filtered through a scoring mechanism. As shown in the Motion Plan Proposing block, multiple trajectories (colored lines) are proposed as potential future paths for the autonomous vehicle. These candidates need to be evaluated and ranked to select the most suitable trajectory for execution. Our key contribution, the Learning Scoring Rules block, demonstrates how we learn interpretable scoring rules from human driving demonstration from NuPlan Karnchanachari2024TowardsLP. Instead of manually crafting rules or using black-box scoring models, we develop a Scoring Logic Network (SLN) that automatically learns temporal logic rules from data (specific rules illustrated in Sec. \ref{['sec:logic_rules_discovered']}). These learned rules are then deployed in the Online Monitoring and Filtering block, which continuously evaluates the proposed trajectories at 20 Hz following NuPlan's log frequency Karnchanachari2024TowardsLP during operation. In the scene visualization, the ego vehicle is shown in gold, while other vehicles are represented in black, the curbs are marked in purple, and the driveable areas (lanes and intersections) are marked as pink and blue, respectively. Among the proposed trajectories, the green trajectory receives the highest score as it successfully maintains a safe distance from the curbs, while exhibiting appropriate curvature and comfort characteristics. All other colored trajectories are not selected for violating one or more learned rules, as detailed in Sec. \ref{['sec:logic_rules_discovered']}. During the monitoring, following the setting in Planning Driver Model (PDM) Dauner2023CORL, we assume that the future 4-second trajectories of other cars are known. The selected trajectory is then executed in the Closed-Loop Simulation block.
  • Figure 2: The logic structure $\bar{\mathcal{L}}$ consists of three types of layers: Temporal, Propositional, and Aggregation. The Temporal layer processes the initial predicates, applying temporal operators. The Propositional layer generates all possible pairs of predicates connected by logical operators. The Aggregation layer aggregates the output of the Propositional layer into one cluster by deciding the logic operator to connect neighboring clusters. Temporal layers can be stacked. a layer's formal definition is in Sec. \ref{['sec:layer_definition']}. Two types of gates, the [draw,inner sep=2pt,rounded corners,fill=selectiong]Aselection gate, and the [draw,inner sep=2pt,rounded corners,fill=negationg,text=white]Anegation gate, are used to control the logic operators and the sign of the cluster inputs, respectively. Each clear circle ($\bigcirc$) in these gates represents a single value weight. In the selection gate, the circle represents the operator with the largest weight, meaning the operator is selected. In the negation gate, the circle represents the negation of the input (i.e., multiply with a negative number), while the circle represents the original input (i.e., a positive number). The gate implementation is described in Sec. \ref{['sec:gate_implementation']}. Supposing we only consider one layer of Temporal layer ($n = 1$), and given a set of predicates $\mathcal{P} = \{P^{1}_{\theta_1}, P^{2}_{\theta_2}, P^{3}_{\theta_3}\}$, $P^{2}_{\theta_2} \in \bar{\mathcal{P}}$ and $P^{1}_{\theta_1}, P^{3}_{\theta_3} \in \dot{\mathcal{P}}$, this learnable logic structure represents the logic formula $(\mathbf{G} P^{1}_{\theta_1} \lor \lnot P^{2}_{\theta_2}) \lor (\lnot \mathbf{G} P^{1}_{\theta_1} \land \mathbf{F} P^{3}_{\theta_3}) \lor (\lnot P^{2}_{\theta_2} \land \mathbf{F} P^{3}_{\theta_3})$ This formula can be further reduced to $P^{2}_{\theta_2} \rightarrow (\mathbf{G} P^{1}_{\theta_1} \lor \mathbf{F} P^{3}_{\theta_3})$.
  • Figure 3: Case study on discovered rules. While our method evaluates 15 trajectory candidates in practice, we show only 4 representative trajectories here for visual clarity. From left to right, these sub-figures explain why the blue, red, and orange plans received lower scores. The blue plan violates the drivable area rule, the red plan exceeds comfort constraints on lateral acceleration, and the orange plan breaks the speed limit without overtaking context.
  • Figure 4: Proposal Number Ablation

Theorems & Definitions (2)

  • Theorem 1
  • proof