Table of Contents
Fetching ...

Probability Density Estimation via Optimal Control

Markus Hegland, C. Yalçın Kaya

TL;DR

The paper tackles nonparametric density estimation from samples drawn from an unknown distribution by recasting the penalized maximum log-likelihood problem as a multiprocess optimal-control problem. A maximum-principle analysis yields a two-point boundary-value problem with interior jumps, which is solved via a novel discretization and an AMPL–Knitro-based solver. Theoretical results provide a TPBP for the estimating function $v$ and a practical scheme to compute density estimates, demonstrated on synthetic normal data and real datasets (Old Faithful geyser and galaxy speeds), with competitive performance against kernel methods in R. The framework offers a flexible path to incorporate regularization and structure through parameters $(eta, imes ext{alpha})$, and it opens avenues for adding moment, quantile, and entropy constraints within an optimal-control setting.

Abstract

We employ optimal control theory to study the problem of estimating the probability density function from a data set originating from an unknown probability distribution. The original variational problem is reformulated as a multi-stage optimal control problem and the associated maximum principle, or conditions of optimality, is reduced to a two-point boundary-value problem with interior conditions. A numerical scheme is proposed to solve the discretization of this problem. Estimates of density functions for synthetic and real data are computed using the proposed approach. The real data come from the Old Faithful geyser and the speeds of a group of galaxies. Comparisons are made with the popular statistics software R.

Probability Density Estimation via Optimal Control

TL;DR

The paper tackles nonparametric density estimation from samples drawn from an unknown distribution by recasting the penalized maximum log-likelihood problem as a multiprocess optimal-control problem. A maximum-principle analysis yields a two-point boundary-value problem with interior jumps, which is solved via a novel discretization and an AMPL–Knitro-based solver. Theoretical results provide a TPBP for the estimating function and a practical scheme to compute density estimates, demonstrated on synthetic normal data and real datasets (Old Faithful geyser and galaxy speeds), with competitive performance against kernel methods in R. The framework offers a flexible path to incorporate regularization and structure through parameters , and it opens avenues for adding moment, quantile, and entropy constraints within an optimal-control setting.

Abstract

We employ optimal control theory to study the problem of estimating the probability density function from a data set originating from an unknown probability distribution. The original variational problem is reformulated as a multi-stage optimal control problem and the associated maximum principle, or conditions of optimality, is reduced to a two-point boundary-value problem with interior conditions. A numerical scheme is proposed to solve the discretization of this problem. Estimates of density functions for synthetic and real data are computed using the proposed approach. The real data come from the Old Faithful geyser and the speeds of a group of galaxies. Comparisons are made with the popular statistics software R.

Paper Structure

This paper contains 13 sections, 4 theorems, 38 equations, 4 figures, 3 tables.

Key Result

Lemma 1

One has that $\lambda_0 > 0$, i.e., that Problem (OCP) is normal. In particular, one can take $\lambda_0 = 1$, and so the optimal control can be written as $u(t) = -\lambda_2(t)$.

Figures (4)

  • Figure 1: Example 1---Normal distribution---density function estimated from various sets of sampled data via optimal control.
  • Figure 2: Example 1---Normal distribution---density function estimated from various sets of sampled data using the kernel method in R, using the same data sets as in Figure \ref{['fig:normal']}.
  • Figure 3: Example 2---The Old Faithful---estimated density functions.
  • Figure 4: Example 3---Galaxies---estimated density functions.

Theorems & Definitions (7)

  • Lemma 1: Normality
  • Remark 1: Jumps in Optimal Control
  • Theorem 1: Necessary Condition of Optimality
  • Remark 2: Smoothness Parameter $\alpha$
  • Corollary 1: Parameter $\gamma$
  • Remark 3: Asymptotic value of $\gamma$
  • Theorem 2: Order Reduction