Table of Contents
Fetching ...

Stochastic dynamic programming with non-linear discounting

Nicole Bäuerle, Anna Jaśkiewicz, Andrzej S. Nowak

TL;DR

The paper tackles stochastic dynamic programming with non-linear discounting by introducing a recursive discounted utility framework on a Borel state space. It develops two equivalent formulations of the utility evaluation and proves that the Bellman equation has a solution and there exists an optimal stationary policy in the infinite horizon, handling both bounded and unbounded one-period utilities via a weight function and subadditive contraction arguments. The analysis extends prior work by allowing non-linear discounting in a stochastic setting and provides practical algorithms—policy iteration and Howard's algorithm—for computing the optimal policy, with applications to growth, inventory, and stopping problems. The results have significance for economic and financial models where non-linear attitudes toward future rewards and unbounded utilities arise, enabling stationary policy optimization in complex dynamic environments.

Abstract

In this paper, we study a Markov decision process with a non-linear discount function and with a Borel state space. We define a recursive discounted utility, which resembles non-additive utility functions considered in a number of models in economics. Non-additivity here follows from non-linearity of the discount function. Our study is complementary to the work of Jaśkiewicz, Matkowski and Nowak (Math. Oper. Res. 38 (2013), 108-121), where also non-linear discounting is used in the stochastic setting, but the expectation of utilities aggregated on the space of all histories of the process is applied leading to a non-stationary dynamic programming model. Our aim is to prove that in the recursive discounted utility case the Bellman equation has a solution and there exists an optimal stationary policy for the problem in the infinite time horizon. Our approach includes two cases: $(a)$ when the one-stage utility is bounded on both sides by a weight function multiplied by some positive and negative constants, and $(b)$ when the one-stage utility is unbounded from below.

Stochastic dynamic programming with non-linear discounting

TL;DR

The paper tackles stochastic dynamic programming with non-linear discounting by introducing a recursive discounted utility framework on a Borel state space. It develops two equivalent formulations of the utility evaluation and proves that the Bellman equation has a solution and there exists an optimal stationary policy in the infinite horizon, handling both bounded and unbounded one-period utilities via a weight function and subadditive contraction arguments. The analysis extends prior work by allowing non-linear discounting in a stochastic setting and provides practical algorithms—policy iteration and Howard's algorithm—for computing the optimal policy, with applications to growth, inventory, and stopping problems. The results have significance for economic and financial models where non-linear attitudes toward future rewards and unbounded utilities arise, enabling stationary policy optimization in complex dynamic environments.

Abstract

In this paper, we study a Markov decision process with a non-linear discount function and with a Borel state space. We define a recursive discounted utility, which resembles non-additive utility functions considered in a number of models in economics. Non-additivity here follows from non-linearity of the discount function. Our study is complementary to the work of Jaśkiewicz, Matkowski and Nowak (Math. Oper. Res. 38 (2013), 108-121), where also non-linear discounting is used in the stochastic setting, but the expectation of utilities aggregated on the space of all histories of the process is applied leading to a non-stationary dynamic programming model. Our aim is to prove that in the recursive discounted utility case the Bellman equation has a solution and there exists an optimal stationary policy for the problem in the infinite time horizon. Our approach includes two cases: when the one-stage utility is bounded on both sides by a weight function multiplied by some positive and negative constants, and when the one-stage utility is unbounded from below.

Paper Structure

This paper contains 12 sections, 14 theorems, 141 equations.

Key Result

lemma 1

Assume that $A(x)$ is compact for each $x\in X.$$(a)$ Let $g\in {\cal M}(D)$ be such that $a\mapsto g(x,a)$ is upper semicontinuous on $A(x)$ for each $x\in X.$ Then, is Borel measurable and there exists a Borel measurable mapping $f^*:X\to A$ such that for all $x\in X.$$(b)$ If, in addition, we assume that $x\mapsto A(x)$ is upper semicontinuous and $g\in {\cal U}(D),$ then $g^*\in {\cal U}(X).

Theorems & Definitions (36)

  • remark 1
  • remark 2
  • remark 3
  • remark 4
  • remark 5
  • definition 1
  • remark 6
  • definition 2
  • lemma 1
  • lemma 2
  • ...and 26 more