DIJIT: A Robotic Head for an Active Observer
Mostafa Kamali Tabrizi, Mingshi Chi, Bir Bikram Dey, Yu Qing Yuan, Markus D. Solbach, Yiqian Liu, Michael Jenkin, John K. Tsotsos
TL;DR
DIJIT introduces a fully binocular, human-inspired robotic head with nine mechanical degrees of freedom and four optical degrees of freedom per camera to enable active vision research and cross-domain comparisons with human vision. A data-driven saccade control approach uses calibration-based homographies to map fixation points to motor commands, avoiding complex kinematic modeling. Experimental results demonstrate human-like saccade accuracy and high peak speeds, with open-source hardware and software facilitating replication and further study. The work advances active binocular perception and lays groundwork for binocular 3D reconstruction and assistive robotics tasks.
Abstract
We present DIJIT, a novel binocular robotic head expressly designed for mobile agents that behave as active observers. DIJIT's unique breadth of functionality enables active vision research and the study of human-like eye and head-neck motions, their interrelationships, and how each contributes to visual ability. DIJIT is also being used to explore the differences between how human vision employs eye/head movements to solve visual tasks and current computer vision methods. DIJIT's design features nine mechanical degrees of freedom, while the cameras and lenses provide an additional four optical degrees of freedom. The ranges and speeds of the mechanical design are comparable to human performance. Our design includes the ranges of motion required for convergent stereo, namely, vergence, version, and cyclotorsion. The exploration of the utility of these to both human and machine vision is ongoing. Here, we present the design of DIJIT and evaluate aspects of its performance. We present a new method for saccadic camera movements. In this method, a direct relationship between camera orientation and motor values is developed. The resulting saccadic camera movements are close to human movements in terms of their accuracy.
