Bayesian Circular Regression with von Mises Quasi-Processes

Yarden Cohen; Alexandre Khae Wu Navarro; Jes Frellsen; Richard E. Turner; Raziel Riemer; Ari Pakman

Bayesian Circular Regression with von Mises Quasi-Processes

Yarden Cohen, Alexandre Khae Wu Navarro, Jes Frellsen, Richard E. Turner, Raziel Riemer, Ari Pakman

TL;DR

This work introduces von Mises Quasi-Processes (vMQP), a Bayesian nonparametric approach for regression with circular responses obtained by conditioning a two-dimensional GP on the unit circle. It derives a maximum-entropy, simple-density prior and develops Stratonovich-like augmentation to enable fast Gibbs sampling, addressing both posterior inference and transductive parameter learning via Exchange, Double Metropolis-Hastings, and Bridging methods. The authors demonstrate the model on wind-direction prediction and gait-cycle phase estimation, showing competitive performance and capturing multimodal uncertainty, with Exponential kernels often outperforming alternatives. The approach provides a principled, transductive framework for circular regression with scalable Bayesian inference and potential extensions to statistical-physics models and score-matching learning.

Abstract

The need for regression models to predict circular values arises in many scientific fields. In this work we explore a family of expressive and interpretable distributions over circle-valued random functions related to Gaussian processes targeting two Euclidean dimensions conditioned on the unit circle. The probability model has connections with continuous spin models in statistical physics. Moreover, its density is very simple and has maximum-entropy, unlike previous Gaussian process-based approaches, which use wrapping or radial marginalization. For posterior inference, we introduce a new Stratonovich-like augmentation that lends itself to fast Gibbs sampling. We argue that transductive learning in these models favors a Bayesian approach to the parameters and apply our sampling scheme to the Double Metropolis-Hastings algorithm. We present experiments applying this model to the prediction of (i) wind directions and (ii) the percentage of the running gait cycle as a function of joint angles.

Bayesian Circular Regression with von Mises Quasi-Processes

TL;DR

Abstract

Paper Structure (31 sections, 49 equations, 10 figures, 1 table)

This paper contains 31 sections, 49 equations, 10 figures, 1 table.

INTRODUCTION
VON-MISES QUASI-PROCESSES
Relation to Gaussian Processes
Including noisy observations
SAMPLING CIRCULAR VARIABLES
The role of $\lambda$.
LEARNING THE PARAMETERS
Problems with point estimates
A fully Bayesian approach
The Exchange and Double MH algorithms.
Efficient Bridging for the vMQP.
RELATED WORKS
Relation to statistical physics.
Other circular models from Gaussian processes.
Variational inference.
...and 16 more sections

Figures (10)

Figure 1: Transductive learning in action. Given seven angular observations on the $x$ axis, we show histograms of posterior samples of the parameters of a vMQP model (\ref{['eq:prior']}) with kernel $K(x_i,x_j) = \sigma^2 \exp(-(x_i-x_j)^2/2l^2)$, for different numbers $m$ of uniformly located predictive locations. Transductive learning manifest itself in the changes of these distributions as a function of the predictive locations. Note the shrinking of $l^2$ as $m$ grows and the multimodality of $\nu$ captured by the Bayesian approach. The confidence interval in the top panel is proportional to the circular variance for $m=40$.
Figure 2: Problems with maximum likelihood estimation. Histograms of two different random evaluations of $\partial_{\kappa}U(\bm{\phi}, \bm{\theta})$, whose difference is required to estimate the log-likelihood gradient (\ref{['eq:cd_gradient']}) w.r.t. $\kappa$. The samples are from the model in \ref{['fig:synth_data']} with $m=40$. The similarity of the distributions leads to highly inaccurate gradient estimates.
Figure 3: Optimal $\lambda$. Relative Effective Sample Size (ESS) (see definition in \ref{['app:RESS']}) of the log of the density (\ref{['eq:conditional_density']}), computed with samples from the data presented in \ref{['fig:synth_data']} with $m=10$, as a function of the parameter $\lambda$ that enables the Cholesky decomposition (\ref{['eq:Cholesky']}). The plot confirms that smaller $\lambda$ should be preferred.
Figure 4: Samplers comparison. Autocorrelation function of the same log-density as in \ref{['fig:ess_lambda']}, computed using the augmented Gibbs and Hamiltonian Monte Carlo on the non-augmented (\ref{['eq:conditional_density']}) and augmented (\ref{['eq:joint']}) models. Evaluations were made between iterations such that the CPU time spent on each sample is roughly equal.
Figure 5: Left: Wind directions in 260 weather stations in Germany, randomly split between 208 train and 52 test locations. The predicted circular means from the vMQP model are indicated on the test locations. Right: Predicted circular variance over a uniform grid of $60 \times 60$ points. Note that the variance grows in regions close to train points with non-aligned directions. Both figures best seen in color.
...and 5 more figures

Bayesian Circular Regression with von Mises Quasi-Processes

TL;DR

Abstract

Bayesian Circular Regression with von Mises Quasi-Processes

Authors

TL;DR

Abstract

Table of Contents

Figures (10)