Table of Contents
Fetching ...

Projected random forests and conformal prediction of circular data

Paulo C. Marques F., Rinaldo Artes, Helton Graziadei

TL;DR

The paper develops conformal prediction for regression with circular responses under exchangeable data by introducing a circular conformity score based on angular distance and a projection method that turns linear-response models into circular predictors. A key contribution is applying this projection to random forests and leveraging out-of-bag conformal prediction to avoid a separate calibration sample, while still achieving practical coverage. Empirical results on synthetic and wind-direction datasets show that projected random forests produce shorter prediction-arc intervals (higher efficiency) than split conformal sets from a projected normal linear model and a circular forest, with empirical coverage close to the nominal level. The work is complemented by open-source software to reproduce the analyses and results.

Abstract

We apply split conformal prediction techniques to regression problems with circular responses by introducing a suitable conformity score, leading to prediction sets with adaptive arc length and finite-sample coverage guarantees for any circular predictive model under exchangeable data. Leveraging the high performance of existing predictive models designed for linear responses, we analyze a general projection procedure that converts any linear response regression model into one suitable for circular responses. When random forests serve as basis models in this projection procedure, we harness the out-of-bag dynamics to eliminate the necessity for a separate calibration sample in the construction of prediction sets. For synthetic and real datasets the resulting projected random forests model produces more efficient out-of-bag conformal prediction sets, with shorter median arc length, when compared to the split conformal prediction sets generated by two existing alternative models.

Projected random forests and conformal prediction of circular data

TL;DR

The paper develops conformal prediction for regression with circular responses under exchangeable data by introducing a circular conformity score based on angular distance and a projection method that turns linear-response models into circular predictors. A key contribution is applying this projection to random forests and leveraging out-of-bag conformal prediction to avoid a separate calibration sample, while still achieving practical coverage. Empirical results on synthetic and wind-direction datasets show that projected random forests produce shorter prediction-arc intervals (higher efficiency) than split conformal sets from a projected normal linear model and a circular forest, with empirical coverage close to the nominal level. The work is complemented by open-source software to reproduce the analyses and results.

Abstract

We apply split conformal prediction techniques to regression problems with circular responses by introducing a suitable conformity score, leading to prediction sets with adaptive arc length and finite-sample coverage guarantees for any circular predictive model under exchangeable data. Leveraging the high performance of existing predictive models designed for linear responses, we analyze a general projection procedure that converts any linear response regression model into one suitable for circular responses. When random forests serve as basis models in this projection procedure, we harness the out-of-bag dynamics to eliminate the necessity for a separate calibration sample in the construction of prediction sets. For synthetic and real datasets the resulting projected random forests model produces more efficient out-of-bag conformal prediction sets, with shorter median arc length, when compared to the split conformal prediction sets generated by two existing alternative models.

Paper Structure

This paper contains 9 sections, 2 theorems, 16 equations, 4 figures, 3 tables, 1 algorithm.

Key Result

Lemma 1

The random vector $(\tilde{R}_1,\dots,\tilde{R}_n,\tilde{R}_{n+1})$ is exchangeable and $P(\tilde{R}_{n+1}\leq\tilde{r})\geq 1 - \alpha$.

Figures (4)

  • Figure 1: Circular histogram of the response variable in the synthetic dataset training sample, with concentration parameter $\kappa=5$.
  • Figure 2: Prediction intervals for fifty test sample units in the synthetic dataset, with $\kappa=5$, produced by the three different methods, using a miscoverage level $\alpha=0.1$. The black dots are the observed circular responses.
  • Figure 3: Circular histogram of the response variable in the wind direction dataset training sample
  • Figure 4: Prediction intervals for fifty test sample units in the wind direction dataset, produced by the three different methods, using a miscoverage level $\alpha=0.1$. The black dots are the observed wind directions.

Theorems & Definitions (3)

  • Lemma 1
  • Theorem 1
  • proof