Table of Contents
Fetching ...

An Approach for Air Drawing Using Background Subtraction and Contour Extraction

Ramkrishna Acharya

TL;DR

The paper addresses air drawing as an input modality by using ROI-based background subtraction and contour extraction to locate a drawing pointer for on-screen rendering. It proposes an end-to-end image-processing pipeline that builds a per-ROI background via running average and determines the pointer as the top point of the largest contour in the residual image, with the pointer coordinates mapped to a drawing canvas. The method integrates a Haar cascade hand detector and Tesseract OCR, achieving about 100 ms latency on a standard webcam and an OCR accuracy of around 98%. The approach is presented as a simpler, cheaper, and faster alternative to sensor-based systems, with potential applications in sign language, in-air writing, and gesture-based interfaces, and it suggests future integration with image completion for extended capabilities.

Abstract

In this paper, we propose a novel approach for air drawing that uses image processing techniques to draw on the screen by moving fingers in the air. This approach benefits a wide range of applications such as sign language, in-air drawing, and 'writing' in the air as a new way of input. The approach starts with preparing ROI (Region of Interest) background images by taking a running average in initial camera frames and later subtracting it from the live camera frames to get a binary mask image. We calculate the pointer's position as the top of the contour on the binary image. When drawing a circle on the canvas in that position, it simulates the drawing. Furthermore, we combine the pre-trained Tesseract model for OCR purposes. To address the false contours, we perform hand detection based on the haar cascade before performing the background subtraction. In an experimental setup, we achieved a latency of only 100ms in air drawing. The code used to this research are available in GitHub as https://github.com/q-viper/Contour-Based-Writing

An Approach for Air Drawing Using Background Subtraction and Contour Extraction

TL;DR

The paper addresses air drawing as an input modality by using ROI-based background subtraction and contour extraction to locate a drawing pointer for on-screen rendering. It proposes an end-to-end image-processing pipeline that builds a per-ROI background via running average and determines the pointer as the top point of the largest contour in the residual image, with the pointer coordinates mapped to a drawing canvas. The method integrates a Haar cascade hand detector and Tesseract OCR, achieving about 100 ms latency on a standard webcam and an OCR accuracy of around 98%. The approach is presented as a simpler, cheaper, and faster alternative to sensor-based systems, with potential applications in sign language, in-air writing, and gesture-based interfaces, and it suggests future integration with image completion for extended capabilities.

Abstract

In this paper, we propose a novel approach for air drawing that uses image processing techniques to draw on the screen by moving fingers in the air. This approach benefits a wide range of applications such as sign language, in-air drawing, and 'writing' in the air as a new way of input. The approach starts with preparing ROI (Region of Interest) background images by taking a running average in initial camera frames and later subtracting it from the live camera frames to get a binary mask image. We calculate the pointer's position as the top of the contour on the binary image. When drawing a circle on the canvas in that position, it simulates the drawing. Furthermore, we combine the pre-trained Tesseract model for OCR purposes. To address the false contours, we perform hand detection based on the haar cascade before performing the background subtraction. In an experimental setup, we achieved a latency of only 100ms in air drawing. The code used to this research are available in GitHub as https://github.com/q-viper/Contour-Based-Writing

Paper Structure

This paper contains 9 sections, 2 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Left: Image from Webcam where ROI drawing happens. Right: A canvas image where drawing, VUI, and pointer movement happen. The dimension of both is (420, 720) pixels.
  • Figure 2: Running Air drawing, mode selection, and detection.