Table of Contents
Fetching ...

A Course in Dynamic Optimization

Bar Light

TL;DR

The notes delve into policy gradient methods for the average reward case, presenting a convergence result for the tabular case in this context, and an introduction to reinforcement learning, including a formal proof of the convergence of Q-learning algorithms.

Abstract

These lecture notes are derived from a graduate-level course in dynamic optimization, offering an introduction to techniques and models extensively used in management science, economics, operations research, engineering, and computer science. The course emphasizes the theoretical underpinnings of discrete-time dynamic programming models and advanced algorithmic strategies for solving these models. Unlike typical treatments, it provides a proof for the principle of optimality for upper semi-continuous dynamic programming, a middle ground between the simpler countable state space case \cite{bertsekas2012dynamic}, and the involved universally measurable case \cite{bertsekas1996stochastic}. This approach is sufficiently rigorous to include important examples such as dynamic pricing, consumption-savings, and inventory management models. The course also delves into the properties of value and policy functions, leveraging classical results \cite{topkis1998supermodularity} and recent developments. Additionally, it offers an introduction to reinforcement learning, including a formal proof of the convergence of Q-learning algorithms. Furthermore, the notes delve into policy gradient methods for the average reward case, presenting a convergence result for the tabular case in this context. This result is simple and similar to the discounted case but appears to be new.

A Course in Dynamic Optimization

TL;DR

The notes delve into policy gradient methods for the average reward case, presenting a convergence result for the tabular case in this context, and an introduction to reinforcement learning, including a formal proof of the convergence of Q-learning algorithms.

Abstract

These lecture notes are derived from a graduate-level course in dynamic optimization, offering an introduction to techniques and models extensively used in management science, economics, operations research, engineering, and computer science. The course emphasizes the theoretical underpinnings of discrete-time dynamic programming models and advanced algorithmic strategies for solving these models. Unlike typical treatments, it provides a proof for the principle of optimality for upper semi-continuous dynamic programming, a middle ground between the simpler countable state space case \cite{bertsekas2012dynamic}, and the involved universally measurable case \cite{bertsekas1996stochastic}. This approach is sufficiently rigorous to include important examples such as dynamic pricing, consumption-savings, and inventory management models. The course also delves into the properties of value and policy functions, leveraging classical results \cite{topkis1998supermodularity} and recent developments. Additionally, it offers an introduction to reinforcement learning, including a formal proof of the convergence of Q-learning algorithms. Furthermore, the notes delve into policy gradient methods for the average reward case, presenting a convergence result for the tabular case in this context. This result is simple and similar to the discounted case but appears to be new.
Paper Structure (42 sections, 36 theorems, 216 equations, 2 figures, 1 algorithm)

This paper contains 42 sections, 36 theorems, 216 equations, 2 figures, 1 algorithm.

Key Result

Proposition 1.1

Let $(X,d)$ be a complete metric space. Let $T : X \rightarrow X$ be a mapping that is $L$-contraction, i.e., $d(T(x),T(y)) \leq L d(x,y)$ for some $0<L<1$ and for all $x,y \in X$. Then $T$ has a unique fixed point.

Figures (2)

  • Figure 1: Aggregated states diagram illustrating the partitioning of 13 original states into four aggregated states $x_1$, $x_2$, $x_3$, and $x_4$. Each square represents an aggregated state containing various original states.
  • Figure 2: Online search combined with "offline" pre-trained values.

Theorems & Definitions (66)

  • Definition 1.1
  • Proposition 1.1
  • Definition 1.2
  • Remark 1.1
  • Remark 1.2
  • Remark 1.3
  • Remark 2.1
  • Remark 2.2
  • Example 2.1
  • Example 2.2
  • ...and 56 more