Table of Contents
Fetching ...

FoldPath: End-to-End Object-Centric Motion Generation via Modulated Implicit Paths

Paolo Rabino, Gabriele Tiboni, Tatiana Tommasi

TL;DR

FoldPath tackles Object-Centric Motion Generation by predicting long-horizon, smooth robot paths directly from 3D point clouds using a neural-field representation. The method abandons disjoint end-effector waypoint sequences and brittle post-processing in favor of end-to-end path prototypes decoded by modulated MLP heads, with path execution sampled via $s_t\in[-1,1]$. A DTW-based Average Precision metric suite is proposed to evaluate curve-aware path quality, and FoldPath achieves state-of-the-art performance on the PaintNet benchmark, including scenarios with limited training samples ($ ext{as few as }$ $70$) and real-world containers, demonstrating practical robustness and readiness for deployment. The results indicate FoldPath can generalize across free-form geometries and complex surface layouts, representing a meaningful advance toward scalable, industrially viable OCMG for spray painting and related robotic tasks.

Abstract

Object-Centric Motion Generation (OCMG) is instrumental in advancing automated manufacturing processes, particularly in domains requiring high-precision expert robotic motions, such as spray painting and welding. To realize effective automation, robust algorithms are essential for generating extended, object-aware trajectories across intricate 3D geometries. However, contemporary OCMG techniques are either based on ad-hoc heuristics or employ learning-based pipelines that are still reliant on sensitive post-processing steps to generate executable paths. We introduce FoldPath, a novel, end-to-end, neural field based method for OCMG. Unlike prior deep learning approaches that predict discrete sequences of end-effector waypoints, FoldPath learns the robot motion as a continuous function, thus implicitly encoding smooth output paths. This paradigm shift eliminates the need for brittle post-processing steps that concatenate and order the predicted discrete waypoints. Particularly, our approach demonstrates superior predictive performance compared to recently proposed learning-based methods, and attains generalization capabilities even in real industrial settings, where only a limited amount of 70 expert samples are provided. We validate FoldPath through comprehensive experiments in a realistic simulation environment and introduce new, rigorous metrics designed to comprehensively evaluate long-horizon robotic paths, thus advancing the OCMG task towards practical maturity.

FoldPath: End-to-End Object-Centric Motion Generation via Modulated Implicit Paths

TL;DR

FoldPath tackles Object-Centric Motion Generation by predicting long-horizon, smooth robot paths directly from 3D point clouds using a neural-field representation. The method abandons disjoint end-effector waypoint sequences and brittle post-processing in favor of end-to-end path prototypes decoded by modulated MLP heads, with path execution sampled via . A DTW-based Average Precision metric suite is proposed to evaluate curve-aware path quality, and FoldPath achieves state-of-the-art performance on the PaintNet benchmark, including scenarios with limited training samples ( ) and real-world containers, demonstrating practical robustness and readiness for deployment. The results indicate FoldPath can generalize across free-form geometries and complex surface layouts, representing a meaningful advance toward scalable, industrially viable OCMG for spray painting and related robotic tasks.

Abstract

Object-Centric Motion Generation (OCMG) is instrumental in advancing automated manufacturing processes, particularly in domains requiring high-precision expert robotic motions, such as spray painting and welding. To realize effective automation, robust algorithms are essential for generating extended, object-aware trajectories across intricate 3D geometries. However, contemporary OCMG techniques are either based on ad-hoc heuristics or employ learning-based pipelines that are still reliant on sensitive post-processing steps to generate executable paths. We introduce FoldPath, a novel, end-to-end, neural field based method for OCMG. Unlike prior deep learning approaches that predict discrete sequences of end-effector waypoints, FoldPath learns the robot motion as a continuous function, thus implicitly encoding smooth output paths. This paradigm shift eliminates the need for brittle post-processing steps that concatenate and order the predicted discrete waypoints. Particularly, our approach demonstrates superior predictive performance compared to recently proposed learning-based methods, and attains generalization capabilities even in real industrial settings, where only a limited amount of 70 expert samples are provided. We validate FoldPath through comprehensive experiments in a realistic simulation environment and introduce new, rigorous metrics designed to comprehensively evaluate long-horizon robotic paths, thus advancing the OCMG task towards practical maturity.

Paper Structure

This paper contains 14 sections, 7 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: FoldPath tackles Object-Centric Motion Generation with an end-to-end deep learning pipeline for the first time. Differing from previous works that require multiple learning and post-processing stages, our model directly predicts long-horizon output paths that generalize across free-form 3D objects.
  • Figure 2: Schematic visualization of FoldPath. It first encodes the input point cloud using a PointNet++ backbone qi2017pointnet++, then it elaborates the obtained visual features $\boldsymbol{z}$ with a transformer decoder together with the learned queries $\boldsymbol{Q}_j$ to obtain path embeddings $\boldsymbol{P}_j$. Finally, every path-specific neural field inspired head is fed with $\boldsymbol{P}_j$ and a scalar $x_{j,t}$ to yield the $t$-th 6D pose $\hat{\boldsymbol{y}}_{j,t}$ of the output path. The whole path is generated by sampling $T$ scalars $s_{t=1,\ldots,T} \in [-1,1]$.
  • Figure 3: Qualitative comparison of a single path from the cuboids test set between various activation functions for our FoldPath model. In black the predictions, in light red the ground truth. ReLU activation yields sharp corners, undesirable for robotic applications. Siren activation yields good curves but lacks precision. Finer performs best overall.
  • Figure 4: Qualitative results on randomly chosen samples from the three main categories of the PaintNet benchmark. Path-wise struggles with long paths. Autoregressive paths are coherent but may miss details. MaskPlanner is able to generate highly precise paths but the post-processing plays a key role in producing accurate results. FoldPath overall outputs the clearest paths without need of any post-processing step.
  • Figure 5: Qualitative evaluation in simulation for a random sample from the container test set. The colormap ranges from green (low) to yellow (high) paint deposition. Some paths can be hard to see due to compenetration with object mesh, but in practice the nozzle is 12 cm far from the object so even compenetrating paths yield correct painting executions.