Robustness Evaluation of Machine Learning Models for Robot Arm Action Recognition in Noisy Environments

Elaheh Motamedi; Kian Behzad; Rojin Zandi; Hojjat Salehinejad; Milad Siami

Robustness Evaluation of Machine Learning Models for Robot Arm Action Recognition in Noisy Environments

Elaheh Motamedi, Kian Behzad, Rojin Zandi, Hojjat Salehinejad, Milad Siami

TL;DR

This work tackles robust robot arm action recognition from noisy visual data by fusing a pretrained 2-D pose estimator with a CNN-based action classifier. The pipeline outputs time-series keypoint trajectories $A^{(j)} \in \mathbb{R}^{K \times T}$ from $J$ cameras and uses a CNN with 1-D convolutions to classify actions into nine Tic-Tac-Toe board locations, comparing against Transformer and Rocket baselines. On a Franka Emika arm dataset collected for a 3×3 Tic-Tac-Toe task, the CNN achieves about 98% test accuracy with high robustness to noise, while Salt-and-Pepper noise degrades all models the most. The findings demonstrate the practical viability of robust, vision-based robot arm action recognition in noisy environments and provide guidance for selecting architectures under adverse conditions.

Abstract

In the realm of robot action recognition, identifying distinct but spatially proximate arm movements using vision systems in noisy environments poses a significant challenge. This paper studies robot arm action recognition in noisy environments using machine learning techniques. Specifically, a vision system is used to track the robot's movements followed by a deep learning model to extract the arm's key points. Through a comparative analysis of machine learning methods, the effectiveness and robustness of this model are assessed in noisy environments. A case study was conducted using the Tic-Tac-Toe game in a 3-by-3 grid environment, where the focus is to accurately identify the actions of the arms in selecting specific locations within this constrained environment. Experimental results show that our approach can achieve precise key point detection and action classification despite the addition of noise and uncertainties to the dataset.

Robustness Evaluation of Machine Learning Models for Robot Arm Action Recognition in Noisy Environments

TL;DR

from

cameras and uses a CNN with 1-D convolutions to classify actions into nine Tic-Tac-Toe board locations, comparing against Transformer and Rocket baselines. On a Franka Emika arm dataset collected for a 3×3 Tic-Tac-Toe task, the CNN achieves about 98% test accuracy with high robustness to noise, while Salt-and-Pepper noise degrades all models the most. The findings demonstrate the practical viability of robust, vision-based robot arm action recognition in noisy environments and provide guidance for selecting architectures under adverse conditions.

Abstract

Paper Structure (11 sections, 5 figures, 3 tables)

This paper contains 11 sections, 5 figures, 3 tables.

Introduction
Method
Robot Arm Pose Recognition
Robot Arm Action Recognition
Experiments
Data Collection
Training Setup
Result Analysis
Baseline Models
Evaluation in Noisy Environments
Conclusion

Figures (5)

Figure 1: Architecture of robot arm action recognition in noisy environments. Noise is applied after the estimation of the key points using ResNet-50 on the robot arm in the environment.
Figure 2: Colored circles highlight detected key points, except for ${j}_4$ and ${j}_6$, which are not visible in this frame due to visibility constraints.
Figure 3: Data collection using one robot arm and two cameras.
Figure 4: Dataset samples: First row captured by camera one, second row by camera two as depicted in Figure \ref{['fig:Labsettig']}.
Figure 5: Comparative analysis of transformer, Rocket, and CNN models in diverse noise scenarios during training, testing, and combined, in percentage. Top row: Cut-out noise; middle row: Salt and Pepper noise; bottom row: Gaussian noise.

Robustness Evaluation of Machine Learning Models for Robot Arm Action Recognition in Noisy Environments

TL;DR

Abstract

Robustness Evaluation of Machine Learning Models for Robot Arm Action Recognition in Noisy Environments

Authors

TL;DR

Abstract

Table of Contents

Figures (5)