Table of Contents
Fetching ...

Aligning Large Language Models with Procedural Rules: An Autoregressive State-Tracking Prompting for In-Game Trading

Minkyung Kim, Junsik Kim, Woongcheol Yang, Sangdon Park, Sohee Bae

TL;DR

This paper tackles the challenge of reconciling LLM-driven conversational flexibility with strict procedural constraints in in-game trading. It introduces Autoregressive State-Tracking Prompting (ASTP), a prompting framework that makes dialogue state tracking explicit and verifiable through a Prime–Guide–Enforce workflow and explicit reporting of the previous state. A four-element design embeds state definitions, transition conditions, previous-state identification, and enforcement into the prompt, achieving near-perfect state-transition compliance and robust transaction integrity. A placeholder-based post-processing (PPP) mechanism further enhances numerical reliability, enabling smaller models to match larger-model performance while delivering substantial latency reductions. Overall, ASTP demonstrates a practical path to reliable, real-time, rule-governed NPC trading and offers a foundation for broader applications requiring expressive yet compliant language interactions.

Abstract

Large Language Models (LLMs) enable dynamic game interactions but fail to follow essential procedural flows in rule-governed trading systems, eroding player trust. This work resolves the core tension between the creative flexibility of LLMs and the procedural demands of in-game trading (browse-offer-review-confirm). To this end, Autoregressive State-Tracking Prompting (ASTP) is introduced, a methodology centered on a strategically orchestrated prompt that compels an LLM to make its state-tracking process explicit and verifiable. Instead of relying on implicit contextual understanding, ASTP tasks the LLM with identifying and reporting a predefined state label from the previous turn. To ensure transactional integrity, this is complemented by a state-specific placeholder post-processing method for accurate price calculations. Evaluation across 300 trading dialogues demonstrates >99% state compliance and 99.3% calculation precision. Notably, ASTP with placeholder post-processing on smaller models (Gemini-2.5-Flash) matches larger models' (Gemini-2.5-Pro) performance while reducing response time from 21.2s to 2.4s, establishing a practical foundation that satisfies both real-time requirements and resource constraints of commercial games.

Aligning Large Language Models with Procedural Rules: An Autoregressive State-Tracking Prompting for In-Game Trading

TL;DR

This paper tackles the challenge of reconciling LLM-driven conversational flexibility with strict procedural constraints in in-game trading. It introduces Autoregressive State-Tracking Prompting (ASTP), a prompting framework that makes dialogue state tracking explicit and verifiable through a Prime–Guide–Enforce workflow and explicit reporting of the previous state. A four-element design embeds state definitions, transition conditions, previous-state identification, and enforcement into the prompt, achieving near-perfect state-transition compliance and robust transaction integrity. A placeholder-based post-processing (PPP) mechanism further enhances numerical reliability, enabling smaller models to match larger-model performance while delivering substantial latency reductions. Overall, ASTP demonstrates a practical path to reliable, real-time, rule-governed NPC trading and offers a foundation for broader applications requiring expressive yet compliant language interactions.

Abstract

Large Language Models (LLMs) enable dynamic game interactions but fail to follow essential procedural flows in rule-governed trading systems, eroding player trust. This work resolves the core tension between the creative flexibility of LLMs and the procedural demands of in-game trading (browse-offer-review-confirm). To this end, Autoregressive State-Tracking Prompting (ASTP) is introduced, a methodology centered on a strategically orchestrated prompt that compels an LLM to make its state-tracking process explicit and verifiable. Instead of relying on implicit contextual understanding, ASTP tasks the LLM with identifying and reporting a predefined state label from the previous turn. To ensure transactional integrity, this is complemented by a state-specific placeholder post-processing method for accurate price calculations. Evaluation across 300 trading dialogues demonstrates >99% state compliance and 99.3% calculation precision. Notably, ASTP with placeholder post-processing on smaller models (Gemini-2.5-Flash) matches larger models' (Gemini-2.5-Pro) performance while reducing response time from 21.2s to 2.4s, establishing a practical foundation that satisfies both real-time requirements and resource constraints of commercial games.

Paper Structure

This paper contains 33 sections, 3 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Comparison of static menus and LLM-driven trading interactions showcasing complex purchases, context-aware recommendations, and relationship-based negotiations within a semi-structured dialogue flow.
  • Figure 2: A simplified view of the prompt structure showing the implementation of ASTP's four design elements. Colors highlight each element's role: blue, red, purple and green for Element 1, 2, 3, and 4, respectively. (Best viewed in color)
  • Figure 3: A visual comparison of the prompt architectures and their constituent elements for the methods evaluated in the State Transition Compliance experiment. Colors correspond to ASTP's four key design elements: blue, red, purple and green for Element 1, 2, 3, and 4, respectively. (Best viewed in color)
  • Figure 4: State transition patterns in 300 dialogues. Cell values at row $i$ and column $j$ represent transitions from state $i$ to state $j$. Abbreviations; C(CASUAL), SI(SHOW_ITEMS), OS(OFFER_SELL), N(NEGOTIATE), FC(FINAL_CHECK), CS(COMMIT_SALE), E(END). Note that the END state does not necessarily signify the termination of the entire dialogue.
  • Figure 5: State transition patterns observed in 300 dialogues for comparative methods and ASTP across two scenarios. Cell values at row $i$ and column $j$ represent transitions from state $i$ to state $j$. A key point of comparison is the enforcement of the FINAL_CHECK step before COMMIT_SALE. The diagrams show that ASTP demonstrates strong procedural adherence by predominantly following the required FINAL_CHECK$\rightarrow$COMMIT_SALE path. Conversely, the comparative methods frequently bypass this safeguard with direct, non-compliant transitions from OFFER_SELL and NEGOTIATE. Abbreviations; C(CASUAL), SI(SHOW_ITEMS), OS(OFFER_SELL), N(NEGOTIATE), FC(FINAL_CHECK), CS(COMMIT_SALE), E(END). Note that the END state does not necessarily signify the termination of the entire dialogue.

Theorems & Definitions (1)

  • Definition 1: ASTP Function