Probability Density Estimation via Optimal Control
Markus Hegland, C. Yalçın Kaya
TL;DR
The paper tackles nonparametric density estimation from samples drawn from an unknown distribution by recasting the penalized maximum log-likelihood problem as a multiprocess optimal-control problem. A maximum-principle analysis yields a two-point boundary-value problem with interior jumps, which is solved via a novel discretization and an AMPL–Knitro-based solver. Theoretical results provide a TPBP for the estimating function $v$ and a practical scheme to compute density estimates, demonstrated on synthetic normal data and real datasets (Old Faithful geyser and galaxy speeds), with competitive performance against kernel methods in R. The framework offers a flexible path to incorporate regularization and structure through parameters $(eta, imes ext{alpha})$, and it opens avenues for adding moment, quantile, and entropy constraints within an optimal-control setting.
Abstract
We employ optimal control theory to study the problem of estimating the probability density function from a data set originating from an unknown probability distribution. The original variational problem is reformulated as a multi-stage optimal control problem and the associated maximum principle, or conditions of optimality, is reduced to a two-point boundary-value problem with interior conditions. A numerical scheme is proposed to solve the discretization of this problem. Estimates of density functions for synthetic and real data are computed using the proposed approach. The real data come from the Old Faithful geyser and the speeds of a group of galaxies. Comparisons are made with the popular statistics software R.
