Table of Contents
Fetching ...

AutoMate: Specialist and Generalist Assembly Policies over Diverse Geometries

Bingjie Tang, Iretiayo Akinola, Jie Xu, Bowen Wen, Ankur Handa, Karl Van Wyk, Dieter Fox, Gaurav S. Sukhatme, Fabio Ramos, Yashraj Narang

TL;DR

AutoMate tackles the challenge of robotic assembly with high geometric and pose diversity by introducing a 100-assembly dataset and parallel simulation environments, and by proposing complementary learning schemes for specialist and generalist policies. Specialists combine assembly-by-disassembly, imitation-augmented RL, and trajectory matching (DTW and path signatures) to solve many assemblies, while a generalist is built via geometry-aware latent encodings, policy distillation, and curriculum RL fine-tuning. The framework achieves strong simulation performance (hundreds of thousands of trials) and notable zero-shot sim-to-real transfer, including perception-initialized assembly, with real-world success rates closely matching or exceeding those in simulation. These results demonstrate a scalable, hybrid approach toward large-model-inspired robotics in industrial settings and point to a path for extending to multi-part assemblies and SE(3) trajectory-alignment tasks in future work.

Abstract

Robotic assembly for high-mixture settings requires adaptivity to diverse parts and poses, which is an open challenge. Meanwhile, in other areas of robotics, large models and sim-to-real have led to tremendous progress. Inspired by such work, we present AutoMate, a learning framework and system that consists of 4 parts: 1) a dataset of 100 assemblies compatible with simulation and the real world, along with parallelized simulation environments for policy learning, 2) a novel simulation-based approach for learning specialist (i.e., part-specific) policies and generalist (i.e., unified) assembly policies, 3) demonstrations of specialist policies that individually solve 80 assemblies with 80% or higher success rates in simulation, as well as a generalist policy that jointly solves 20 assemblies with an 80%+ success rate, and 4) zero-shot sim-to-real transfer that achieves similar (or better) performance than simulation, including on perception-initialized assembly. The key methodological takeaway is that a union of diverse algorithms from manufacturing engineering, character animation, and time-series analysis provides a generic and robust solution for a diverse range of robotic assembly problems. To our knowledge, AutoMate provides the first simulation-based framework for learning specialist and generalist policies over a wide range of assemblies, as well as the first system demonstrating zero-shot sim-to-real transfer over such a range. For videos and additional details, please see our project website: https://bingjietang718.github.io/automate/

AutoMate: Specialist and Generalist Assembly Policies over Diverse Geometries

TL;DR

AutoMate tackles the challenge of robotic assembly with high geometric and pose diversity by introducing a 100-assembly dataset and parallel simulation environments, and by proposing complementary learning schemes for specialist and generalist policies. Specialists combine assembly-by-disassembly, imitation-augmented RL, and trajectory matching (DTW and path signatures) to solve many assemblies, while a generalist is built via geometry-aware latent encodings, policy distillation, and curriculum RL fine-tuning. The framework achieves strong simulation performance (hundreds of thousands of trials) and notable zero-shot sim-to-real transfer, including perception-initialized assembly, with real-world success rates closely matching or exceeding those in simulation. These results demonstrate a scalable, hybrid approach toward large-model-inspired robotics in industrial settings and point to a path for extending to multi-part assemblies and SE(3) trajectory-alignment tasks in future work.

Abstract

Robotic assembly for high-mixture settings requires adaptivity to diverse parts and poses, which is an open challenge. Meanwhile, in other areas of robotics, large models and sim-to-real have led to tremendous progress. Inspired by such work, we present AutoMate, a learning framework and system that consists of 4 parts: 1) a dataset of 100 assemblies compatible with simulation and the real world, along with parallelized simulation environments for policy learning, 2) a novel simulation-based approach for learning specialist (i.e., part-specific) policies and generalist (i.e., unified) assembly policies, 3) demonstrations of specialist policies that individually solve 80 assemblies with 80% or higher success rates in simulation, as well as a generalist policy that jointly solves 20 assemblies with an 80%+ success rate, and 4) zero-shot sim-to-real transfer that achieves similar (or better) performance than simulation, including on perception-initialized assembly. The key methodological takeaway is that a union of diverse algorithms from manufacturing engineering, character animation, and time-series analysis provides a generic and robust solution for a diverse range of robotic assembly problems. To our knowledge, AutoMate provides the first simulation-based framework for learning specialist and generalist policies over a wide range of assemblies, as well as the first system demonstrating zero-shot sim-to-real transfer over such a range. For videos and additional details, please see our project website: https://bingjietang718.github.io/automate/
Paper Structure (41 sections, 18 equations, 24 figures, 9 tables, 1 algorithm)

This paper contains 41 sections, 18 equations, 24 figures, 9 tables, 1 algorithm.

Figures (24)

  • Figure 2: Simulation-compatible assembly dataset. We provide a dataset of 100 assemblies derived from tian2022assemble. The assemblies are interpenetration-free, allowing them to be simulated in widely-used robotics simulators.
  • Figure 3: Real-world versions of assemblies from our dataset. We print all 100 assemblies from our dataset in the real world and show 20 assemblies above, with unique IDs listed for later reference.
  • Figure 4: Simulation-based generation of disassembly paths. For each assembly, we generate disassembly paths by A) executing a grasp from a grasp optimization procedure, B) using a low-level controller to lift the plug from the socket and move to a randomized pose, and C) repeating the process for additional poses, until D) collecting 100 successful disassembly paths.
  • Figure 5: t-SNE visualization of geometric representations of 100 assemblies. We train a PointNet-based autoencoder to learn a latent representation of assembly geometry, and we use t-SNE (with perplexity = 6) to reduce the dimensionality of the latent vectors to 2D. Here we plot the lower-dimensional representations of all 100 assemblies. For visualization, we sample 10 assets that are well distributed across clusters. We also show examples of multiple assemblies sampled from the same cluster in Figure \ref{['fig:t-sne_clusters']}.
  • Figure 6: Simulation-based evaluation of trajectory-matching approaches for learning specialist policies. For each of the 100 assemblies, we train a specialist policy with 4 different approaches for matching the current robot path with demonstrations. For each approach, we train 5 random seeds, select the best seed, and evaluate it 5 times over 1000 trials. We illustrate average results over all 100 assemblies, as well as specific results for 10 sampled assemblies (Figure \ref{['fig:t-sne_selected_assets']}). IndustReal is a state-of-the-art matching-free approach. State selects the demonstration containing the closest point to the current robot state. Signature selects the demonstration with the minimum signature-transform distance from the robot trajectory. DTW selects the demonstration with the minimum dynamic-time-warping distance from the robot trajectory. The Signature and DTW approaches significantly outperform the others.
  • ...and 19 more figures