Table of Contents
Fetching ...

System Design for Maintaining Internal State Consistency in Long-Horizon Robotic Tabletop Games

Guangyu Zhao, Ceyao Zhang, Chengdong Ma, Tao Wu, Yiyang Song, Haoxuan Ru, Yifan Zhong, Ruilin Yan, Lingfeng Li, Ruochong Li, Yu Li, Xuyuan Han, Yun Ding, Ruizhang Jiang, Xiaochuan Zhang, Yichao Li, Yuanpei Chen, Yaodong Yang, Yitao Liang

Abstract

Long-horizon tabletop games pose a distinct systems challenge for robotics: small perceptual or execution errors can invalidate accumulated task state, propagate across decision-making modules, and ultimately derail interaction. This paper studies how to maintain internal state consistency in turn-based, multi-human robotic tabletop games through deliberate system design rather than isolated component improvement. Using Mahjong as a representative long-horizon setting, we present an integrated architecture that explicitly maintains perceptual, execution, and interaction state, partitions high-level semantic reasoning from time-critical perception and control, and incorporates verified action primitives with tactile-triggered recovery to prevent premature state corruption. We further introduce interaction-level monitoring mechanisms to detect turn violations and hidden-information breaches that threaten execution assumptions. Beyond demonstrating complete-game operation, we provide an empirical characterization of failure modes, recovery effectiveness, cross-module error propagation, and hardware-algorithm trade-offs observed during deployment. Our results show that explicit partitioning, monitored state transitions, and recovery mechanisms are critical for sustaining executable consistency over extended play, whereas monolithic or unverified pipelines lead to measurable degradation in end-to-end reliability. The proposed system serves as an empirical platform for studying system-level design principles in long-horizon, turn-based interaction.

System Design for Maintaining Internal State Consistency in Long-Horizon Robotic Tabletop Games

Abstract

Long-horizon tabletop games pose a distinct systems challenge for robotics: small perceptual or execution errors can invalidate accumulated task state, propagate across decision-making modules, and ultimately derail interaction. This paper studies how to maintain internal state consistency in turn-based, multi-human robotic tabletop games through deliberate system design rather than isolated component improvement. Using Mahjong as a representative long-horizon setting, we present an integrated architecture that explicitly maintains perceptual, execution, and interaction state, partitions high-level semantic reasoning from time-critical perception and control, and incorporates verified action primitives with tactile-triggered recovery to prevent premature state corruption. We further introduce interaction-level monitoring mechanisms to detect turn violations and hidden-information breaches that threaten execution assumptions. Beyond demonstrating complete-game operation, we provide an empirical characterization of failure modes, recovery effectiveness, cross-module error propagation, and hardware-algorithm trade-offs observed during deployment. Our results show that explicit partitioning, monitored state transitions, and recovery mechanisms are critical for sustaining executable consistency over extended play, whereas monolithic or unverified pipelines lead to measurable degradation in end-to-end reliability. The proposed system serves as an empirical platform for studying system-level design principles in long-horizon, turn-based interaction.

Paper Structure

This paper contains 20 sections, 5 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Overview of the tabletop Mahjong system. The architecture centers on a maintained internal game state (perceptual, execution, and interaction state). A vision-language model performs strategic reasoning and rule interpretation at low frequency, while time-critical perception modules support real-time detection and pose estimation for manipulation. Action primitives are executed with tactile-based verification and recovery to prevent premature state updates. The bottom examples illustrate how game history informs reasoning and is grounded into verified physical actions within a closed-loop system.
  • Figure 2: Cross-module error propagation in long-horizon tabletop gameplay. Perceptual, execution, or interaction errors can propagate through the maintained game history and internal state, corrupting downstream reasoning and actions if not detected and recovered. The red dashed lines represent propagation paths that could immediately create visible insecurity.
  • Figure 3: Representative deployment environments and gameplay settings. Including 1) Top-down views of the Mahjong table layout and per-player tile walls. 2) Close-up of manipulation during discarding and claiming. 3) Multi-human turn-based gameplay in laboratory conditions. 4) Public exhibition deployment with multiple participants.
  • Figure 4: Smoothed hardware error probability as a function of continuous operation time. Error frequency increases noticeably after approximately 20,000 seconds of sustained execution.
  • Figure 5: Training pipeline for strategic reasoning. Stage 1 distills a conventional RL-trained policy into the VLM via supervised fine-tuning with LLM-synthesized reasoning traces. Stage 2 applies single-step RL (GRPO) to optimize decision quality beyond imitation. Stage 3 uses self-play with DPO to discover strategies that surpass the original teacher policy.