Table of Contents
Fetching ...

RL-AD-Net: Reinforcement Learning Guided Adaptive Displacement in Latent Space for Refined Point Cloud Completion

Bhanu Pratap Paregi, Vaibhav Kumar

TL;DR

RL-AD-Net tackles local geometric artifacts in point cloud completion by performing post-hoc refinement in the latent space of a category-specific autoencoder. A per-category TD3 agent learns latent refinements in a $128$-D GFV, which are decoded back into refined point clouds; a geometry-aware PointNN-based selector then chooses the better reconstruction, enabling deployment without ground-truth supervision. The approach is lightweight, model-agnostic, and relies on category priors learned by the autoencoder to correct local errors without retraining the base backbone. Experiments on ShapeNetCore-2048 demonstrate consistent improvements in CD-L2 and F-Score@1% across multiple cropping scenarios, validating latent-space RL refinement as a practical, plug-and-play enhancement for diverse completion models.

Abstract

Recent point cloud completion models, including transformer-based, denoising-based, and other state-of-the-art approaches, generate globally plausible shapes from partial inputs but often leave local geometric inconsistencies. We propose RL-AD-Net, a reinforcement learning (RL) refinement framework that operates in the latent space of a pretrained point autoencoder. The autoencoder encodes completions into compact global feature vectors (GFVs), which are selectively adjusted by an RL agent to improve geometric fidelity. To ensure robustness, a lightweight non-parametric PointNN selector evaluates the geometric consistency of both the original completion and the RL-refined output, retaining the better reconstruction. When ground truth is available, both Chamfer Distance and geometric consistency metrics guide refinement. Training is performed separately per category, since the unsupervised and dynamic nature of RL makes convergence across highly diverse categories challenging. Nevertheless, the framework can be extended to multi-category refinement in future work. Experiments on ShapeNetCore-2048 demonstrate that while baseline completion networks perform reasonable under their training-style cropping, they struggle in random cropping scenarios. In contrast, RL-AD-Net consistently delivers improvements across both settings, highlighting the effectiveness of RL-guided ensemble refinement. The approach is lightweight, modular, and model-agnostic, making it applicable to a wide range of completion networks without requiring retraining.

RL-AD-Net: Reinforcement Learning Guided Adaptive Displacement in Latent Space for Refined Point Cloud Completion

TL;DR

RL-AD-Net tackles local geometric artifacts in point cloud completion by performing post-hoc refinement in the latent space of a category-specific autoencoder. A per-category TD3 agent learns latent refinements in a -D GFV, which are decoded back into refined point clouds; a geometry-aware PointNN-based selector then chooses the better reconstruction, enabling deployment without ground-truth supervision. The approach is lightweight, model-agnostic, and relies on category priors learned by the autoencoder to correct local errors without retraining the base backbone. Experiments on ShapeNetCore-2048 demonstrate consistent improvements in CD-L2 and F-Score@1% across multiple cropping scenarios, validating latent-space RL refinement as a practical, plug-and-play enhancement for diverse completion models.

Abstract

Recent point cloud completion models, including transformer-based, denoising-based, and other state-of-the-art approaches, generate globally plausible shapes from partial inputs but often leave local geometric inconsistencies. We propose RL-AD-Net, a reinforcement learning (RL) refinement framework that operates in the latent space of a pretrained point autoencoder. The autoencoder encodes completions into compact global feature vectors (GFVs), which are selectively adjusted by an RL agent to improve geometric fidelity. To ensure robustness, a lightweight non-parametric PointNN selector evaluates the geometric consistency of both the original completion and the RL-refined output, retaining the better reconstruction. When ground truth is available, both Chamfer Distance and geometric consistency metrics guide refinement. Training is performed separately per category, since the unsupervised and dynamic nature of RL makes convergence across highly diverse categories challenging. Nevertheless, the framework can be extended to multi-category refinement in future work. Experiments on ShapeNetCore-2048 demonstrate that while baseline completion networks perform reasonable under their training-style cropping, they struggle in random cropping scenarios. In contrast, RL-AD-Net consistently delivers improvements across both settings, highlighting the effectiveness of RL-guided ensemble refinement. The approach is lightweight, modular, and model-agnostic, making it applicable to a wide range of completion networks without requiring retraining.

Paper Structure

This paper contains 26 sections, 4 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Overview of RL-AD-Net. A partial point cloud is first completed by a pretrained baseline backbone. The completion is encoded into a 128D Global Feature Vector (GFV) via the autoencoder. A TD3 RL agent predicts an adjustment vector to refine the GFV. The refined GFV is decoded back into a point cloud using the AE’s decoder. A reward signal, based on Chamfer Distance improvement and action magnitude penalty, guides policy learning.Finally, a comparison block evaluates both the baseline and refined completions using Chamfer Distance and geometric consistency scores, and selects the better output as the final prediction.
  • Figure 2: Comparison of AdaPoinTr and RL-AD-Net outputs on Airplane (50% crop). Left: AdaPoinTr baseline; Right: RL-AD-Net refinement. RL-AD-Net improves recall while preserving precision, leading to a higher F-score@1%.
  • Figure 3: Qualitative visualizations under 25% spherical cropping. Categories: Car, Airplane, Table, and Lamp. RL-AD-Net yields more complete and geometrically consistent reconstructions than AdaPoinTr as marked with red squares.
  • Figure 4: Qualitative visualizations under 50% spherical cropping. Categories: Car, Chair, and Lamp. RL-AD-Net reduces structural gaps and recovers missing regions shown in red squares while preserving fine details.
  • Figure 5: Qualitative visualizations under 40% random masking. Categories: Car, Table, Lamp ,chair and Airplane. Black boxes mark hallucinated or distorted regions in AdaPoinTr outputs relative to the ground truth. Red circles highlight refinement failures in RL-AD-Net caused by propagation of these errors.
  • ...and 6 more figures