Table of Contents
Fetching ...

pix2pockets: Shot Suggestions in 8-Ball Pool from a Single Image in the Wild

Jonas Myhre Schiøtt, Viktor Sebastian Petersen, Dim P. Papadopoulos

TL;DR

The paper tackles automatic shot suggestion for 8-ball pool from a single image by first detecting table geometry and ball positions and then embedding the situation in a standardized RL environment to predict optimal shots. It contributes (i) a 5748-annotation pool object dataset from 195 images plus viewpoint-rich examples, (ii) a two-stage Ball Location and Shot Suggestion architecture, (iii) a Gymnasium-based RL environment with Pymunk physics, and (iv) empirical evidence that standard RL struggles on full-table scenarios while a simple baseline can achieve high per-shot success. The work provides both practical tools and benchmarking resources to accelerate research on AI-assisted sports coaching. Overall, pix2pockets establishes a foundation for end-to-end, data-driven shot planning in pool and invites broader community engagement.

Abstract

Computer vision models have seen increased usage in sports, and reinforcement learning (RL) is famous for beating humans in strategic games such as Chess and Go. In this paper, we are interested in building upon these advances and examining the game of classic 8-ball pool. We introduce pix2pockets, a foundation for an RL-assisted pool coach. Given a single image of a pool table, we first aim to detect the table and the balls and then propose the optimal shot suggestion. For the first task, we build a dataset with 195 diverse images where we manually annotate all balls and table dots, leading to 5748 object segmentation masks. For the second task, we build a standardized RL environment that allows easy development and benchmarking of any RL algorithm. Our object detection model yields an AP50 of 91.2 while our ball location pipeline obtains an error of only 0.4 cm. Furthermore, we compare standard RL algorithms to set a baseline for the shot suggestion task and we show that all of them fail to pocket all balls without making a foul move. We also present a simple baseline that achieves a per-shot success rate of 94.7% and clears a full game in a single turn 30% of the time.

pix2pockets: Shot Suggestions in 8-Ball Pool from a Single Image in the Wild

TL;DR

The paper tackles automatic shot suggestion for 8-ball pool from a single image by first detecting table geometry and ball positions and then embedding the situation in a standardized RL environment to predict optimal shots. It contributes (i) a 5748-annotation pool object dataset from 195 images plus viewpoint-rich examples, (ii) a two-stage Ball Location and Shot Suggestion architecture, (iii) a Gymnasium-based RL environment with Pymunk physics, and (iv) empirical evidence that standard RL struggles on full-table scenarios while a simple baseline can achieve high per-shot success. The work provides both practical tools and benchmarking resources to accelerate research on AI-assisted sports coaching. Overall, pix2pockets establishes a foundation for end-to-end, data-driven shot planning in pool and invites broader community engagement.

Abstract

Computer vision models have seen increased usage in sports, and reinforcement learning (RL) is famous for beating humans in strategic games such as Chess and Go. In this paper, we are interested in building upon these advances and examining the game of classic 8-ball pool. We introduce pix2pockets, a foundation for an RL-assisted pool coach. Given a single image of a pool table, we first aim to detect the table and the balls and then propose the optimal shot suggestion. For the first task, we build a dataset with 195 diverse images where we manually annotate all balls and table dots, leading to 5748 object segmentation masks. For the second task, we build a standardized RL environment that allows easy development and benchmarking of any RL algorithm. Our object detection model yields an AP50 of 91.2 while our ball location pipeline obtains an error of only 0.4 cm. Furthermore, we compare standard RL algorithms to set a baseline for the shot suggestion task and we show that all of them fail to pocket all balls without making a foul move. We also present a simple baseline that achieves a per-shot success rate of 94.7% and clears a full game in a single turn 30% of the time.

Paper Structure

This paper contains 11 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: pix2pockets. We introduce a new task for shot suggestions in pool games using a single input image. First, we detect the table and estimate the position of the balls. Then, they are fed into a pool environment, and a Reinforcement Learning agent predicts the best available shot (i.e., cue angle and shot power).
  • Figure 2: Our Dataset. (a) It contains 195 annotated images of tables captured from various angles with diverse lighting conditions. (b) We annotate 5748 objects with accurate segmentation masks. The maximum number of objects in the image varies from class to class. (c) Bounding box annotated examples. Note how sometimes the balls are not completely visible from the given view.
  • Figure 3: Full pipeline. The input image $I_{\text{in}}$ is run through the Ball Location Model to estimate the ball positions on the table, which is then handed to the Shot Suggestion Model. First, we obtain the dot detections $d$ and the ball detections $b$ on $I_{\text{in}}$. We use $d$ to find the table lines and thus estimate a mapping $H$ from $I_{\text{in}}$ to a template $T$. Then, we use $H$ to estimate the center point $p$ for the balls $b$, resulting in the positions $\widetilde{p}$ for the environment. The Shot Suggestion Model sends the state $S \in \mathcal{S}$ to the agent, suggesting the $A \in \mathcal{A}$. During training, the environment evaluates the action, and the agent receives a reward $R \in \mathcal{R}$.
  • Figure 4: (a) Training size. The AP50 of models with different training set sizes, showing diminishing gains after 80 images. (b) Shot Accuracy. To determine the shot precision, we test the performance for different $\sigma$ values. In the 1-Ball environment, a precision of 0.25 degrees is enough to pocket the ball. Ball-ball interaction requires larger precision (0.01 degrees) when using additional balls.
  • Figure 5: Projection error. To estimate the projection error, the front-view and 45-view projections are compared to the top-view ground truth. The projection result is shown on the top-view image for accuracy assessment. The blue lines indicate the distance from the estimated center point $p$ to the ground truth. The mean shift is compared to the table length in $T$, and scaled to a regular 9ft table.
  • ...and 2 more figures