Table of Contents
Fetching ...

Preference-Driven Active 3D Scene Representation for Robotic Inspection in Nuclear Decommissioning

Zhen Meng, Kan Chen, Xiangmin Xu, Erwin Jose Lopez Pulgarin, Emma Li, Philip G. Zhao, David Flynn

TL;DR

This work addresses the mismatch between traditional geometry/rendering-focused active 3D scene representations and operator-specific objectives in high-risk environments. It introduces a reinforcement-learning-from-human-feedback framework that learns a reward model from pairwise operator preferences and optimizes viewpoint planning via PPO, using a reward likelihood $P[\sigma_1 succ \sigma_2]$ and a corresponding cross-entropy loss to train $ yahat{r}$. The approach is validated on a UR3e-based reactor-tile inspection setup with 400 reconstructions and expert preferences, demonstrating improved scene fidelity and reduced trajectory length across multiple 3D representations. This operator-centric, online-learning paradigm advances adaptive, safety-critical robotic perception for nuclear decommissioning and similar high-risk tasks, with potential extensions to scalable human feedback and language-model-assisted reasoning.

Abstract

Active 3D scene representation is pivotal in modern robotics applications, including remote inspection, manipulation, and telepresence. Traditional methods primarily optimize geometric fidelity or rendering accuracy, but often overlook operator-specific objectives, such as safety-critical coverage or task-driven viewpoints. This limitation leads to suboptimal viewpoint selection, particularly in constrained environments such as nuclear decommissioning. To bridge this gap, we introduce a novel framework that integrates expert operator preferences into the active 3D scene representation pipeline. Specifically, we employ Reinforcement Learning from Human Feedback (RLHF) to guide robotic path planning, reshaping the reward function based on expert input. To capture operator-specific priorities, we conduct interactive choice experiments that evaluate user preferences in 3D scene representation. We validate our framework using a UR3e robotic arm for reactor tile inspection in a nuclear decommissioning scenario. Compared to baseline methods, our approach enhances scene representation while optimizing trajectory efficiency. The RLHF-based policy consistently outperforms random selection, prioritizing task-critical details. By unifying explicit 3D geometric modeling with implicit human-in-the-loop optimization, this work establishes a foundation for adaptive, safety-critical robotic perception systems, paving the way for enhanced automation in nuclear decommissioning, remote maintenance, and other high-risk environments.

Preference-Driven Active 3D Scene Representation for Robotic Inspection in Nuclear Decommissioning

TL;DR

This work addresses the mismatch between traditional geometry/rendering-focused active 3D scene representations and operator-specific objectives in high-risk environments. It introduces a reinforcement-learning-from-human-feedback framework that learns a reward model from pairwise operator preferences and optimizes viewpoint planning via PPO, using a reward likelihood and a corresponding cross-entropy loss to train . The approach is validated on a UR3e-based reactor-tile inspection setup with 400 reconstructions and expert preferences, demonstrating improved scene fidelity and reduced trajectory length across multiple 3D representations. This operator-centric, online-learning paradigm advances adaptive, safety-critical robotic perception for nuclear decommissioning and similar high-risk tasks, with potential extensions to scalable human feedback and language-model-assisted reasoning.

Abstract

Active 3D scene representation is pivotal in modern robotics applications, including remote inspection, manipulation, and telepresence. Traditional methods primarily optimize geometric fidelity or rendering accuracy, but often overlook operator-specific objectives, such as safety-critical coverage or task-driven viewpoints. This limitation leads to suboptimal viewpoint selection, particularly in constrained environments such as nuclear decommissioning. To bridge this gap, we introduce a novel framework that integrates expert operator preferences into the active 3D scene representation pipeline. Specifically, we employ Reinforcement Learning from Human Feedback (RLHF) to guide robotic path planning, reshaping the reward function based on expert input. To capture operator-specific priorities, we conduct interactive choice experiments that evaluate user preferences in 3D scene representation. We validate our framework using a UR3e robotic arm for reactor tile inspection in a nuclear decommissioning scenario. Compared to baseline methods, our approach enhances scene representation while optimizing trajectory efficiency. The RLHF-based policy consistently outperforms random selection, prioritizing task-critical details. By unifying explicit 3D geometric modeling with implicit human-in-the-loop optimization, this work establishes a foundation for adaptive, safety-critical robotic perception systems, paving the way for enhanced automation in nuclear decommissioning, remote maintenance, and other high-risk environments.

Paper Structure

This paper contains 14 sections, 1 equation, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Motivation: Traditional methods rely on static metrics without considering task- and preference-based 3D representations.
  • Figure 2: Overview of the proposed framework. The pipeline consists of five key stages: (1) Robotic exploration and trajectory planning, where the robotic system collects observations; (2) Expert operator preference evaluation, where operators select preferred scene representations; (3) Learning a reward model based on collected human feedback; (4) Policy optimization using algorithms; and (5) Online training, where new data continuously refines the learned policy for improved viewpoint selection.
  • Figure 3: Illustrations of user interface for preference-based 3D scene selection and sequence representation. The interface allows users to compare and select preferred 3D scene representations, which are used to train a reward predictor for viewpoint optimization. Users can zoom, rotate, and inspect models for detailed evaluation. Notably, the illustrated line indicates the viewpoint selection order, not actual robotic motion. The real UR3e motion is planned using the Isaac Sim motion planner for smooth and optimized execution.
  • Figure 4: Experimental setup of the -based 3D scene representation system. The setup consists of a UR3e robotic arm with an Intel RealSense D435i camera, controlled via a ROS-based framework. A control server handles motion execution, while a server optimizes viewpoint selection based on human feedback. An ensures efficient data transfer, enabling real-time policy refinement for 3D scene representation. Our demo video is available at https://youtu.be/mAAipFOotx8.
  • Figure 5: Convergence performance of our proposed framework across different 3D scene representation methods.
  • ...and 2 more figures