CNN-based Game State Detection for a Foosball Table
David Hagens, Jan M. Knaup, Elke Hergenröther, Andreas Weinmann
TL;DR
This work tackles the problem of obtaining a compact, information-rich game state for Foosball to support DRL and Imitation Learning. It introduces CNN-based end-to-end regressors, one per rod, that predict the shift and rotation (encoded as ($s$, $\\cos \varphi$, $\\sin \varphi$)) from top-down camera images, with ground truth provided by accelerometers and CV-derived measurements. A ground-truth dataset with 500 frames and multiple backbones (ResNet, MobileNet, EfficientNet) demonstrates that the approach achieves accurate state estimation (e.g., $MAE_{shift}\approx 3.88$ mm, $MAE_{rot}\approx 5.93^{\circ}$ for the ResNet18 model) and is complemented by a ZeroMQ-based data provisioning system for real-time data sharing. The results show strong accuracy across rods but identify practical challenges such as lighting and blur, and the need for parallelized inference to achieve real-time performance, which is crucial for deployment in DRL and Imitation Learning settings.
Abstract
The automation of games using Deep Reinforcement Learning Strategies (DRL) is a well-known challenge in AI research. While for feature extraction in a video game typically the whole image is used, this is hardly practical for many real world games. Instead, using a smaller game state reducing the dimension of the parameter space to include essential parameters only seems to be a promising approach. In the game of Foosball, a compact and comprehensive game state description consists of the positional shifts and rotations of the figures and the position of the ball over time. In particular, velocities and accelerations can be derived from consecutive time samples of the game state. In this paper, a figure detection system to determine the game state in Foosball is presented. We capture a dataset containing the rotations of the rods which were measured using accelerometers and the positional shifts were derived using traditional Computer Vision techniques (in a laboratory setting). This dataset is utilized to train Convolutional Neural Network (CNN) based end-to-end regression models to predict the rotations and shifts of each rod. We present an evaluation of our system using different state-of-the-art CNNs as base architectures for the regression model. We show that our system is able to predict the game state with high accuracy. By providing data for both black and white teams, the presented system is intended to provide the required data for future developments of Imitation Learning techniques w.r.t. to observing human players.
