A CNN Based Framework for Unistroke Numeral Recognition in Air-Writing
Prasun Roy, Subhankar Ghosh, Umapada Pal
TL;DR
This paper tackles the problem of robust unistroke numeral recognition in air using a cost-effective, single-camera setup. It introduces a marker-based pipeline that segments a fixed-color marker, reconstructs a 2D trajectory via a velocity-based motion model with $N_{FPS}=1/t_{update}$ and a threshold $v_T$, and classifies projections with a CNN pre-trained on MNIST and fine-tuned on air-written data across English, Bengali, and Devanagari numerals. The key contributions include marker-based segmentation to avoid skin-tone variability, an end-to-end air-writing pipeline, and demonstrated multilingual recognition with transfer learning, achieving high accuracies (e.g., 97.7% English, 95.4% Bengali, 93.7% Devanagari) under favorable conditions. The approach is cost-efficient and device-agnostic, enabling integration into common devices without depth sensors, with potential future work toward markerless operation and more flexible usage. Overall, the work advances practical, accessible air-writing recognition and highlights the benefit of domain adaptation from handwritten to air-written numeral data.
Abstract
Air-writing refers to virtually writing linguistic characters through hand gestures in three-dimensional space with six degrees of freedom. This paper proposes a generic video camera-aided convolutional neural network (CNN) based air-writing framework. Gestures are performed using a marker of fixed color in front of a generic video camera, followed by color-based segmentation to identify the marker and track the trajectory of the marker tip. A pre-trained CNN is then used to classify the gesture. The recognition accuracy is further improved using transfer learning with the newly acquired data. The performance of the system varies significantly on the illumination condition due to color-based segmentation. In a less fluctuating illumination condition, the system is able to recognize isolated unistroke numerals of multiple languages. The proposed framework has achieved 97.7%, 95.4% and 93.7% recognition rates in person independent evaluations on English, Bengali and Devanagari numerals, respectively.
