Robustness Evaluation of Machine Learning Models for Robot Arm Action Recognition in Noisy Environments
Elaheh Motamedi, Kian Behzad, Rojin Zandi, Hojjat Salehinejad, Milad Siami
TL;DR
This work tackles robust robot arm action recognition from noisy visual data by fusing a pretrained 2-D pose estimator with a CNN-based action classifier. The pipeline outputs time-series keypoint trajectories $A^{(j)} \in \mathbb{R}^{K \times T}$ from $J$ cameras and uses a CNN with 1-D convolutions to classify actions into nine Tic-Tac-Toe board locations, comparing against Transformer and Rocket baselines. On a Franka Emika arm dataset collected for a 3×3 Tic-Tac-Toe task, the CNN achieves about 98% test accuracy with high robustness to noise, while Salt-and-Pepper noise degrades all models the most. The findings demonstrate the practical viability of robust, vision-based robot arm action recognition in noisy environments and provide guidance for selecting architectures under adverse conditions.
Abstract
In the realm of robot action recognition, identifying distinct but spatially proximate arm movements using vision systems in noisy environments poses a significant challenge. This paper studies robot arm action recognition in noisy environments using machine learning techniques. Specifically, a vision system is used to track the robot's movements followed by a deep learning model to extract the arm's key points. Through a comparative analysis of machine learning methods, the effectiveness and robustness of this model are assessed in noisy environments. A case study was conducted using the Tic-Tac-Toe game in a 3-by-3 grid environment, where the focus is to accurately identify the actions of the arms in selecting specific locations within this constrained environment. Experimental results show that our approach can achieve precise key point detection and action classification despite the addition of noise and uncertainties to the dataset.
