CNN-based Game State Detection for a Foosball Table

David Hagens; Jan M. Knaup; Elke Hergenröther; Andreas Weinmann

CNN-based Game State Detection for a Foosball Table

David Hagens, Jan M. Knaup, Elke Hergenröther, Andreas Weinmann

TL;DR

This work tackles the problem of obtaining a compact, information-rich game state for Foosball to support DRL and Imitation Learning. It introduces CNN-based end-to-end regressors, one per rod, that predict the shift and rotation (encoded as ($s$, $\\cos \varphi$, $\\sin \varphi$)) from top-down camera images, with ground truth provided by accelerometers and CV-derived measurements. A ground-truth dataset with 500 frames and multiple backbones (ResNet, MobileNet, EfficientNet) demonstrates that the approach achieves accurate state estimation (e.g., $MAE_{shift}\approx 3.88$ mm, $MAE_{rot}\approx 5.93^{\circ}$ for the ResNet18 model) and is complemented by a ZeroMQ-based data provisioning system for real-time data sharing. The results show strong accuracy across rods but identify practical challenges such as lighting and blur, and the need for parallelized inference to achieve real-time performance, which is crucial for deployment in DRL and Imitation Learning settings.

Abstract

The automation of games using Deep Reinforcement Learning Strategies (DRL) is a well-known challenge in AI research. While for feature extraction in a video game typically the whole image is used, this is hardly practical for many real world games. Instead, using a smaller game state reducing the dimension of the parameter space to include essential parameters only seems to be a promising approach. In the game of Foosball, a compact and comprehensive game state description consists of the positional shifts and rotations of the figures and the position of the ball over time. In particular, velocities and accelerations can be derived from consecutive time samples of the game state. In this paper, a figure detection system to determine the game state in Foosball is presented. We capture a dataset containing the rotations of the rods which were measured using accelerometers and the positional shifts were derived using traditional Computer Vision techniques (in a laboratory setting). This dataset is utilized to train Convolutional Neural Network (CNN) based end-to-end regression models to predict the rotations and shifts of each rod. We present an evaluation of our system using different state-of-the-art CNNs as base architectures for the regression model. We show that our system is able to predict the game state with high accuracy. By providing data for both black and white teams, the presented system is intended to provide the required data for future developments of Imitation Learning techniques w.r.t. to observing human players.

CNN-based Game State Detection for a Foosball Table

TL;DR

)) from top-down camera images, with ground truth provided by accelerometers and CV-derived measurements. A ground-truth dataset with 500 frames and multiple backbones (ResNet, MobileNet, EfficientNet) demonstrates that the approach achieves accurate state estimation (e.g.,

mm,

for the ResNet18 model) and is complemented by a ZeroMQ-based data provisioning system for real-time data sharing. The results show strong accuracy across rods but identify practical challenges such as lighting and blur, and the need for parallelized inference to achieve real-time performance, which is crucial for deployment in DRL and Imitation Learning settings.

Abstract

Paper Structure (11 sections, 10 figures, 3 tables)

This paper contains 11 sections, 10 figures, 3 tables.

Introduction
Related Work
Contributions
Game State Detection
Dataset Creation
End-to-End Regressor Networks
Data Provisioning System
Evaluation
Quantitative Evaluation
Qualitative Evaluation
Conclusion, Discussion and Future Work

Figures (10)

Figure 1: The physical Foosball table. The black team is controlled by industrial linear and rotary motors while the white team is controlled by humans. A Logitech BRIO webcam captures the playing field in a top down perspective.
Figure 2: A working example of our figure detection system. The predicted positional shift and rotation angle of each rod is printed above the rod. The captured data can be used by a DRL agent through a ZeroMQ based data provisioning system.
Figure 3: Measuring the rotational shift relative to the ground ($\alpha$) using a two-axis accelerometer. Since the gravitational force of $1g$ is fixed and due to the orthogonal alignment of the $X$ and $Y$ axis, the measured accelerations on those axis are proportional to the sine and cosine of $\alpha$.
Figure 4: The hardware used to measure the rotation of the white figures. Four GY-521 modules with MNPU6050 accelerometers on 3D-printed mounts were screwed on top of the rods. The accelerometers are connected to an ESP32 based micro-controller via an $I^2C$ connector. Additionally, we included a WS2812B-based LED strip to indicate the internal timing and a button to calibrate the zero-point of the accelerometers.
Figure 5: Our proposed dataset capturing process. One iteration of the process corresponds to one frame taken by the camera. While the black figures are moved automatically, the white figures need to be moved manually during the dataset capturing to get a versatile set of different shifts and rotations.
...and 5 more figures

CNN-based Game State Detection for a Foosball Table

TL;DR

Abstract

CNN-based Game State Detection for a Foosball Table

Authors

TL;DR

Abstract

Table of Contents

Figures (10)