A High-Fidelity Robotic Manipulator Teleoperation Framework for Human-Centered Augmented Reality Evaluation

Harsh Chhajed; Tian Guo

A High-Fidelity Robotic Manipulator Teleoperation Framework for Human-Centered Augmented Reality Evaluation

Harsh Chhajed, Tian Guo

TL;DR

ARBot tackles the reproducibility gap in AR evaluation by providing a deterministic physical proxy that captures natural human hand motion via multimodal interfaces (ARPose mobile app and CV+IMU) and replay it on robotic manipulators through a proactively-safe control stack. The core methods combine a Newton-Raphson-based IK solver with a Convex Quadratic Programming safety filter and a position-output integration to ensure sub-centimeter accuracy and low latency ($\approx$ $5$–$20$ mm and $\approx$ $20$ ms for ARPose, up to $\sim$90 ms for CV+IMU). The paper demonstrates through an IRB-approved study (N=11) and a 132-trajectory dataset that ARBot achieves high fidelity and repeatability, outperforming human motion variability and enabling standardized benchmarking for AR tracking and interaction. The work contributes an open-source end-to-end platform, a rich dataset, and methodological groundwork for reproducible, human-centered AR evaluation with practical impact on AR system development and benchmarking.

Abstract

Validating Augmented Reality (AR) tracking and interaction models requires precise, repeatable ground-truth motion. However, human users cannot reliably perform consistent motion due to biomechanical variability. Robotic manipulators are promising to act as human motion proxies if they can mimic human movements. In this work, we design and implement ARBot, a real-time teleoperation platform that can effectively capture natural human motion and accurately replay the movements via robotic manipulators. ARBot includes two capture models: stable wrist motion capture via a custom CV and IMU pipeline, and natural 6-DOF control via a mobile application. We design a proactively-safe QP controller to ensure smooth, jitter-free execution of the robotic manipulator, enabling it to function as a high-fidelity record and replay physical proxy. We open-source ARBot and release a benchmark dataset of 132 human and synthetic trajectories captured using ARBot to support controllable and scalable AR evaluation.

A High-Fidelity Robotic Manipulator Teleoperation Framework for Human-Centered Augmented Reality Evaluation

TL;DR

–

mm and

ms for ARPose, up to

90 ms for CV+IMU). The paper demonstrates through an IRB-approved study (N=11) and a 132-trajectory dataset that ARBot achieves high fidelity and repeatability, outperforming human motion variability and enabling standardized benchmarking for AR tracking and interaction. The work contributes an open-source end-to-end platform, a rich dataset, and methodological groundwork for reproducible, human-centered AR evaluation with practical impact on AR system development and benchmarking.

Abstract

Paper Structure (33 sections, 3 equations, 6 figures)

This paper contains 33 sections, 3 equations, 6 figures.

Introduction
Related Work
System Design
Capturing Interfaces
ARPose (Mobile Interface).
CV+IMU (Touchless Interface).
Control Architecture
Geometric Solver (Newton-Raphson)
Safety Filter (Quadratic Programming).
Command Integration (Position Output).
Implementation
Coordinate Homogenization
Low-Latency Network Stack
Evaluation Package
Evaluation
...and 18 more sections

Figures (6)

Figure 1: ARBot in Action. ARBot serves as a physical proxy for AR evaluation, capable of capturing natural human motion via mobile ARPose (left) or CV+IMU interfaces (right) and replaying it robotic manipulators (top).
Figure 2: System Architecture. The pipeline captures human intent via multimodal interfaces, processes it through a QP safety controller, and executes it on the Robotic manipulator, ensuring high-fidelity reproduction of user motion.
Figure 3: System Latency CDF. ARPose interface shows consistently lower latency ($<40$ms), while CV+IMU exhibits a long tail due to processing overhead.
Figure 4: Tracking Error (ATE) CDF. Both methods achieve 5mm median accuracy, but CV+IMU shows better stability (P95) against tracking outliers.
Figure 5: Spatial Dynamics Analysis. Columns 1-3 display the error heatmaps for Square, Circle, and S-Shape trajectories. Blue indicates low error ($<7.5$mm) and Red indicates higher error ($<20$mm). Column 4 shows the temporal position error evolution.
...and 1 more figures

A High-Fidelity Robotic Manipulator Teleoperation Framework for Human-Centered Augmented Reality Evaluation

TL;DR

Abstract

A High-Fidelity Robotic Manipulator Teleoperation Framework for Human-Centered Augmented Reality Evaluation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)