A Course in Dynamic Optimization

Bar Light

A Course in Dynamic Optimization

Bar Light

TL;DR

The notes delve into policy gradient methods for the average reward case, presenting a convergence result for the tabular case in this context, and an introduction to reinforcement learning, including a formal proof of the convergence of Q-learning algorithms.

Abstract

These lecture notes are derived from a graduate-level course in dynamic optimization, offering an introduction to techniques and models extensively used in management science, economics, operations research, engineering, and computer science. The course emphasizes the theoretical underpinnings of discrete-time dynamic programming models and advanced algorithmic strategies for solving these models. Unlike typical treatments, it provides a proof for the principle of optimality for upper semi-continuous dynamic programming, a middle ground between the simpler countable state space case \cite{bertsekas2012dynamic}, and the involved universally measurable case \cite{bertsekas1996stochastic}. This approach is sufficiently rigorous to include important examples such as dynamic pricing, consumption-savings, and inventory management models. The course also delves into the properties of value and policy functions, leveraging classical results \cite{topkis1998supermodularity} and recent developments. Additionally, it offers an introduction to reinforcement learning, including a formal proof of the convergence of Q-learning algorithms. Furthermore, the notes delve into policy gradient methods for the average reward case, presenting a convergence result for the tabular case in this context. This result is simple and similar to the discounted case but appears to be new.

A Course in Dynamic Optimization

TL;DR

Abstract

Paper Structure (42 sections, 36 theorems, 216 equations, 2 figures, 1 algorithm)

This paper contains 42 sections, 36 theorems, 216 equations, 2 figures, 1 algorithm.

Lecture 1: Introduction, Metric Spaces, Probability Spaces
Introduction
A Few Key Notions in Dynamic Optimization
Metric Spaces
Probability Spaces
Exercise 1
Lecture 2: The Principle of Optimality in Dynamic Programming
Discounted Dynamic Programming
The Dynamic Programming Principle
Upper Semi-Continuous Dynamic Programming
Exercise 2
Lecture 3: Properties of the Value and Policy Functions
Stochastic Dominance
Value Function Properties
Properties of the Optimal Policy Function
...and 27 more sections

Key Result

Proposition 1.1

Let $(X,d)$ be a complete metric space. Let $T : X \rightarrow X$ be a mapping that is $L$-contraction, i.e., $d(T(x),T(y)) \leq L d(x,y)$ for some $0<L<1$ and for all $x,y \in X$. Then $T$ has a unique fixed point.

Figures (2)

Figure 1: Aggregated states diagram illustrating the partitioning of 13 original states into four aggregated states $x_1$, $x_2$, $x_3$, and $x_4$. Each square represents an aggregated state containing various original states.
Figure 2: Online search combined with "offline" pre-trained values.

Theorems & Definitions (66)

Definition 1.1
Proposition 1.1
Definition 1.2
Remark 1.1
Remark 1.2
Remark 1.3
Remark 2.1
Remark 2.2
Example 2.1
Example 2.2
...and 56 more

A Course in Dynamic Optimization

TL;DR

Abstract

A Course in Dynamic Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (66)