Table of Contents
Fetching ...

The Era of End-to-End Autonomy: Transitioning from Rule-Based Driving to Large Driving Models

Eduardo Nebot, Julie Stephany Berrio Perez

Abstract

Autonomous driving is undergoing a shift from modular rule based pipelines toward end to end (E2E) learning systems. This paper examines this transition by tracing the evolution from classical sense perceive plan control architectures to large driving models (LDMs) capable of mapping raw sensor input directly to driving actions. We analyze recent developments including Tesla's Full Self Driving (FSD) V12 V14, Rivian's Unified Intelligence platform, NVIDIA Cosmos, and emerging commercial robotaxi deployments, focusing on architectural design, deployment strategies, safety considerations and industry implications. A key emerging product category is supervised E2E driving, often referred to as FSD (Supervised) or L2 plus plus, which several manufacturers plan to deploy from 2026 onwards. These systems can perform most of the Dynamic Driving Task (DDT) in complex environments while requiring human supervision, shifting the driver's role to safety oversight. Early operational evidence suggests E2E learning handles the long tail distribution of real world driving scenarios and is becoming a dominant commercial strategy. We also discuss how similar architectural advances may extend beyond autonomous vehicles (AV) to other embodied AI systems, including humanoid robotics.

The Era of End-to-End Autonomy: Transitioning from Rule-Based Driving to Large Driving Models

Abstract

Autonomous driving is undergoing a shift from modular rule based pipelines toward end to end (E2E) learning systems. This paper examines this transition by tracing the evolution from classical sense perceive plan control architectures to large driving models (LDMs) capable of mapping raw sensor input directly to driving actions. We analyze recent developments including Tesla's Full Self Driving (FSD) V12 V14, Rivian's Unified Intelligence platform, NVIDIA Cosmos, and emerging commercial robotaxi deployments, focusing on architectural design, deployment strategies, safety considerations and industry implications. A key emerging product category is supervised E2E driving, often referred to as FSD (Supervised) or L2 plus plus, which several manufacturers plan to deploy from 2026 onwards. These systems can perform most of the Dynamic Driving Task (DDT) in complex environments while requiring human supervision, shifting the driver's role to safety oversight. Early operational evidence suggests E2E learning handles the long tail distribution of real world driving scenarios and is becoming a dominant commercial strategy. We also discuss how similar architectural advances may extend beyond autonomous vehicles (AV) to other embodied AI systems, including humanoid robotics.
Paper Structure (25 sections, 6 figures)

This paper contains 25 sections, 6 figures.

Figures (6)

  • Figure 1: Authors in 10.1109/TPAMI.2024.3435937 define (a) the classical modular approach separates perception, prediction, and planning through intermediate representations such as bounding boxes and trajectories. (b) The end-to-end paradigm jointly learns interconnected modules, allowing information flow and backpropagation across perception, mapping, prediction, and planning components.
  • Figure 2: Training Process for large Driving Models. Phase one use a curated dataset based on good driving behaviors to obtain a baseline model. In Phase 2, additional data, comprising fleet-collected edge cases and synthetically generated scenarios, are used to further train the policy via reinforcement learning, yielding an updated model with improved safety.
  • Figure 3: FSD Supervised operation: The driver select the destination and initiate the system; the vehicle will then perform 100% of the Dynamic Driving Task (DDT) for the entire journey to the destination.
  • Figure 4: Visualization Interface: The system provides the driver with clear situational awareness and communicates the vehicle’s immediate intended action and trajectory. (a) Top left: the vehicle will exit the roundabout. (b) Top right: the vehicle will follow the road while turning right. (c) Bottom right: the vehicle will turn right at an intersection. (d) The vehicle will proceed straight, while the interface will display road infrastructure and nearby vehicles to indicate the vehicle’s intended path and surrounding context.
  • Figure 5: Visual–Haptic–Dynamic interface: (a) Top left: the vehicle is waiting for a gap in traffic to merge. (b) Top right: Haptic—the steering wheel begins to rotate, indicating the vehicle has detected an opening. (c) Top left: Visual—the planned trajectory starts to appear; Haptic—the steering wheel moves; Dynamic—the vehicle accelerates slowly to alert the driver that it is about to move. (d) Visual—the full trajectory is displayed in blue; Dynamic—the vehicle begins the merge.
  • ...and 1 more figures