Learning Sidewalk Autopilot from Multi-Scale Imitation with Corrective Behavior Expansion

Honglin He; Yukai Ma; Brad Squicciarini; Wayne Wu; Bolei Zhou

Learning Sidewalk Autopilot from Multi-Scale Imitation with Corrective Behavior Expansion

Honglin He, Yukai Ma, Brad Squicciarini, Wayne Wu, Bolei Zhou

Abstract

Sidewalk micromobility is a promising solution for last-mile transportation, but current learning-based control methods struggle in complex urban environments. Imitation learning (IL) learns policies from human demonstrations, yet its reliance on fixed offline data often leads to compounding errors, limited robustness, and poor generalization. To address these challenges, we propose a framework that advances IL through corrective behavior expansion and multi-scale imitation learning. On the data side, we augment teleoperation datasets with diverse corrective behaviors and sensor augmentations to enable the policy to learn to recover from its own mistakes. On the model side, we introduce a multi-scale IL architecture that captures both short-horizon interactive behaviors and long-horizon goal-directed intentions via horizon-based trajectory clustering and hierarchical supervision. Real-world experiments show that our approach significantly improves robustness and generalization in diverse sidewalk scenarios.

Learning Sidewalk Autopilot from Multi-Scale Imitation with Corrective Behavior Expansion

Abstract

Paper Structure (16 sections, 5 equations, 6 figures, 5 tables)

This paper contains 16 sections, 5 equations, 6 figures, 5 tables.

INTRODUCTION
Related Work
Method
Problem Formulation
Multi-scale Imitation Learning with Anchors
Teleoperation Data Expansions
Experiments
Dataset
Implementation Details
Open-Loop Evaluation
Ablation Study
Real-World Deployment
Experimental Setup
Results
Conclusions and Future Work
...and 1 more sections

Figures (6)

Figure 1: This work aims to utilize corrective behavior expansion and multi-scale prediction to learn an autopilot model for sidewalk micromobility.
Figure 2: Illustration of the MIMIC framework. The model adopts an encoder–decoder architecture that combines coarse historical embeddings with fine-grained current visual observations as context. The context encoder converts the observation sequence by combining the coarse flattened features of historical frames with the fine patch-level features of the current frame, together with the goal point and camera features. The action decoder leverages time-horizon-specific anchors to produce actions parameterized by GMMs across multiple horizons, thereby enhancing the output's diversity and robustness.
Figure 3: Illustration of the corrective behavior expansion. We first estimate the depth sequence and reconstruct a point cloud. Given the 3D point cloud, we perturb the trajectory using a deviation–recovery noise sequence. Then we synthesize corresponding observation-action pairs.
Figure 4: Illustration of the sensor augmentation. A pretrained relighting model is used to modify the scene guided by different lighting prompts. The original scenario is segmented into foreground and background regions where different relighting parameters are applied. The outputs are then blended to synthesize novel relighted observations.
Figure 5: Qualitative results of MIMIC on the CoS test set. The green trajectory denotes the one with the highest probability, while the others represent the top-6 trajectories filtered by non-maximum suppression (NMS).
...and 1 more figures

Learning Sidewalk Autopilot from Multi-Scale Imitation with Corrective Behavior Expansion

Abstract

Learning Sidewalk Autopilot from Multi-Scale Imitation with Corrective Behavior Expansion

Authors

Abstract

Table of Contents

Figures (6)