Table of Contents
Fetching ...

How to discretize continuous state-action spaces in Q-learning: A symbolic control approach

Sadek Belamfedel Alaoui, Adnane Saoud

TL;DR

The work addresses the challenge of applying Q-learning to systems with continuous state-action spaces by introducing a symbolic abstraction that pairs a continuous system $\Sigma$ with a finite abstract model $\Sigma_{\mathrm{D}}$ through an alternating-simulation relation. It develops a Q-learning framework on the symbolic model that produces two Q-tables, $\underline{q}$ and $\overline{q}$, which bound the true continuous-space Q-values and converge to the optimal policy as the discretization becomes finer. Theoretical results link the tightness of the bounds to the quantization parameters $\eta$ and $\mu$, and show contraction properties, existence of unique optimal values, and conditions under which the extracted controller approaches optimality. Empirical demonstrations on Mountain Car and Van der Pol oscillator illustrate that the dual-Q approach yields accurate, verifiable controllers and outperforms naïve uniform discretization, with convergence guaranteed under appropriate learning settings. The framework provides a principled, controllable way to trade computational complexity for accuracy in continuous-domain reinforcement learning.

Abstract

Q-learning is widely recognized as an effective approach for synthesizing controllers to achieve specific goals. However, handling challenges posed by continuous state-action spaces remains an ongoing research focus. This paper presents a systematic analysis that highlights a major drawback in space discretization methods. To address this challenge, the paper proposes a symbolic model that represents behavioral relations, such as alternating simulation from abstraction to the controlled system. This relation allows for seamless application of the synthesized controller based on abstraction to the original system. Introducing a novel Q-learning technique for symbolic models, the algorithm yields two Q-tables encoding optimal policies. Theoretical analysis demonstrates that these Q-tables serve as both upper and lower bounds on the Q-values of the original system with continuous spaces. Additionally, the paper explores the correlation between the parameters of the space abstraction and the loss in Q-values. The resulting algorithm facilitates achieving optimality within an arbitrary accuracy, providing control over the trade-off between accuracy and computational complexity. The obtained results provide valuable insights for selecting appropriate learning parameters and refining the controller. The engineering relevance of the proposed Q-learning based symbolic model is illustrated through two case studies.

How to discretize continuous state-action spaces in Q-learning: A symbolic control approach

TL;DR

The work addresses the challenge of applying Q-learning to systems with continuous state-action spaces by introducing a symbolic abstraction that pairs a continuous system with a finite abstract model through an alternating-simulation relation. It develops a Q-learning framework on the symbolic model that produces two Q-tables, and , which bound the true continuous-space Q-values and converge to the optimal policy as the discretization becomes finer. Theoretical results link the tightness of the bounds to the quantization parameters and , and show contraction properties, existence of unique optimal values, and conditions under which the extracted controller approaches optimality. Empirical demonstrations on Mountain Car and Van der Pol oscillator illustrate that the dual-Q approach yields accurate, verifiable controllers and outperforms naïve uniform discretization, with convergence guaranteed under appropriate learning settings. The framework provides a principled, controllable way to trade computational complexity for accuracy in continuous-domain reinforcement learning.

Abstract

Q-learning is widely recognized as an effective approach for synthesizing controllers to achieve specific goals. However, handling challenges posed by continuous state-action spaces remains an ongoing research focus. This paper presents a systematic analysis that highlights a major drawback in space discretization methods. To address this challenge, the paper proposes a symbolic model that represents behavioral relations, such as alternating simulation from abstraction to the controlled system. This relation allows for seamless application of the synthesized controller based on abstraction to the original system. Introducing a novel Q-learning technique for symbolic models, the algorithm yields two Q-tables encoding optimal policies. Theoretical analysis demonstrates that these Q-tables serve as both upper and lower bounds on the Q-values of the original system with continuous spaces. Additionally, the paper explores the correlation between the parameters of the space abstraction and the loss in Q-values. The resulting algorithm facilitates achieving optimality within an arbitrary accuracy, providing control over the trade-off between accuracy and computational complexity. The obtained results provide valuable insights for selecting appropriate learning parameters and refining the controller. The engineering relevance of the proposed Q-learning based symbolic model is illustrated through two case studies.
Paper Structure (22 sections, 13 theorems, 57 equations, 10 figures, 1 table, 3 algorithms)

This paper contains 22 sections, 13 theorems, 57 equations, 10 figures, 1 table, 3 algorithms.

Key Result

Lemma 1

Under Assumption Compact, there exists a positive constant $L_{\mathcal{A}}$ such that for every distinct $\xi, \overline{\xi} \in \mathcal{S}$, $v \in \mathcal{\mathcal{A}(\xi)}$ and $\overline{v} \in \mathcal{A}(\overline{\xi})$ we have:

Figures (10)

  • Figure 1: Mismatch between the actual system trajectory and the trajectory obtained by uniform discretization under the same policy $\pi$. The green cells represent the actual transitions whereas the red cells represent the transition captured by the uniform discretisation.
  • Figure 2: Principle of Abstraction based Q-learning.
  • Figure 3: The $k^{th}$ Q-value function for a given symbolic state $s$ over the continuous state space containing $\xi$.
  • Figure 4: Car trajectory obtained using the refined optimal policy derived from the minimal Q-values using Algorithm \ref{['alg:q_learning_symbolic']}.
  • Figure 5: Car trajectory obtained using the refined optimal policy derived from the maximal Q-values using Algorithm \ref{['alg:q_learning_symbolic']}.
  • ...and 5 more figures

Theorems & Definitions (30)

  • Lemma 1
  • Proof
  • Theorem 1
  • Definition 1
  • Definition 2
  • Proposition 1
  • proof
  • Proposition 2
  • Corollary 1
  • Proof
  • ...and 20 more