Improving Robotic Arms through Natural Language Processing, Computer Vision, and Edge Computing
Pascal Sikorski, Kaleb Yu, Lucy Billadeau, Flavio Esposito, Hadi AliAkbarpour, Madi Babaiasl
TL;DR
The paper tackles the challenge of intuitive, reliable human–robot interaction for assistive robotic arms by proposing an edge-enabled architecture that fuses NLP via large language models, computer vision, and offline speech-to-text to interpret natural language commands and execute them through ROS-controlled manipulation. It demonstrates a proof-of-concept on a 4-DOF robotic arm, achieving accurate intent interpretation and 100% object manipulation success in color-based tasks, with offline, edge-based processing improving latency and privacy. The results indicate that combining language understanding with perception yields responsive, user-centric assistive robots, while future work targets user studies with disabled participants, localization of LLMs, complete offline operation, and vision pipeline upgrades. Overall, this work lays groundwork for highly autonomous, private, and adaptable assistive robotics that better support independence and quality of life for individuals with disabilities.
Abstract
This paper introduces a prototype for a new approach to assistive robotics, integrating edge computing with Natural Language Processing (NLP) and computer vision to enhance the interaction between humans and robotic systems. Our proof of concept demonstrates the feasibility of using large language models (LLMs) and vision systems in tandem for interpreting and executing complex commands conveyed through natural language. This integration aims to improve the intuitiveness and accessibility of assistive robotic systems, making them more adaptable to the nuanced needs of users with disabilities. By leveraging the capabilities of edge computing, our system has the potential to minimize latency and support offline capability, enhancing the autonomy and responsiveness of assistive robots. Experimental results from our implementation on a robotic arm show promising outcomes in terms of accurate intent interpretation and object manipulation based on verbal commands. This research lays the groundwork for future developments in assistive robotics, focusing on creating highly responsive, user-centric systems that can significantly improve the quality of life for individuals with disabilities. For video demonstrations and source code, please refer to: https://tinyurl.com/EnhancedArmEdgeNLP.
