Amplifying robotics capacities with a human touch: An immersive low-latency panoramic remote system
Junjie Li, Kang Li, Dewei Han, Jian Xu, Zhaoyuan Ma
TL;DR
The paper presents the Avatar platform, a panoramic, low-latency remote operation system designed to enhance human–robot collaboration. By integrating a six-camera panoramic sensor, edge computing, 5G networking, cloud-based AI (YOLO, ORB-SLAM3), and a VR-driven client, it achieves immersive 360° perception with event-to-eye latencies in the range of a few hundred milliseconds under favorable network conditions. The authors validate the architecture through experiments that measure event-to-eye latency and demonstrate real-time remote control of a mobile robot with an expandable robotic arm, plus map recording and autonomous navigation capabilities. The work highlights practical benefits for long-distance operations and dangerous environments, while outlining avenues for latency optimization and broader applications in industry and healthcare.
Abstract
AI and robotics technologies have witnessed remarkable advancements in the past decade, revolutionizing work patterns and opportunities in various domains. The application of these technologies has propelled society towards an era of symbiosis between humans and machines. To facilitate efficient communication between humans and intelligent robots, we propose the "Avatar" system, an immersive low-latency panoramic human-robot interaction platform. We have designed and tested a prototype of a rugged mobile platform integrated with edge computing units, panoramic video capture devices, power batteries, robot arms, and network communication equipment. Under favorable network conditions, we achieved a low-latency high-definition panoramic visual experience with a delay of 357ms. Operators can utilize VR headsets and controllers for real-time immersive control of robots and devices. The system enables remote control over vast physical distances, spanning campuses, provinces, countries, and even continents (New York to Shenzhen). Additionally, the system incorporates visual SLAM technology for map and trajectory recording, providing autonomous navigation capabilities. We believe that this intuitive system platform can enhance efficiency and situational experience in human-robot collaboration, and with further advancements in related technologies, it will become a versatile tool for efficient and symbiotic cooperation between AI and humans.
