Table of Contents
Fetching ...

Approximating the universal thermal climate index using sparse regression with orthogonal polynomials

Sabin Roman, Gregor Skok, Ljupco Todorovski, Saso Dzeroski

TL;DR

The Universal Thermal Climate Index (UTCI) is a multivariate, physiologically based measure of thermal comfort whose existing approximations (a 4D look-up table and a sixth-degree polynomial) trade accuracy for speed. The authors introduce sparse regression on an orthogonal Legendre polynomial basis to decompose the UTCI Offset into a Fourier-like Legendre expansion, yielding uncorrelated, hierarchically structured coefficients and improved numerical conditioning. They demonstrate substantial accuracy gains over the standard baseline, achieving $RMSE=0.88^ ext{\circ}$C at degree 10 (and $0.60^ ext{\circ}$C at degree 16) with manageable parameter counts, and a dramatic reduction in large errors (e.g., $>2^ ext{\circ}$C dropped from 8% to 4%); the coefficient decay follows a $1/n$ pattern, indicating effective sparse enrichment. The work delivers a Pareto front for model complexity versus accuracy, robust generalization under 20/80 training/testing splits and bootstrapping, and an accessible Python implementation with domain validity checks, offering a practical and interpretable update to UTCI computation for environmental modeling and forecasting systems.

Abstract

This article explores novel data-driven modeling approaches for analyzing and approximating the Universal Thermal Climate Index (UTCI), a physiologically-based metric integrating multiple atmospheric variables to assess thermal comfort. Given the nonlinear, multivariate structure of UTCI, we investigate symbolic and sparse regression techniques as tools for interpretable and efficient function approximation. In particular, we highlight the benefits of using orthogonal polynomial bases-such as Legendre polynomials-in sparse regression frameworks, demonstrating their advantages in stability, convergence, and hierarchical interpretability compared to standard polynomial expansions. We demonstrate that our models achieve significantly lower root-mean squared losses than the widely used sixth-degree polynomial benchmark-while using the same or fewer parameters. By leveraging Legendre polynomial bases, we construct models that efficiently populate a Pareto front of accuracy versus complexity and exhibit stable, hierarchical coefficient structures across varying model capacities. Training on just 20% of the data, our models generalize robustly to the remaining 80%, with consistent performance under bootstrapping. The decomposition effectively approximates the UTCI as a Fourier-like expansion in an orthogonal basis, yielding results near the theoretical optimum in the L2 (least squares) sense. We also connect these findings to the broader context of equation discovery in environmental modeling, referencing probabilistic grammar-based methods that enforce domain consistency and compactness in symbolic expressions. Taken together, these results illustrate how combining sparsity, orthogonality, and symbolic structure enables robust, interpretable modeling of complex environmental indices like UTCI - and significantly outperforms the state-of-the-art approximation in both accuracy and efficiency.

Approximating the universal thermal climate index using sparse regression with orthogonal polynomials

TL;DR

The Universal Thermal Climate Index (UTCI) is a multivariate, physiologically based measure of thermal comfort whose existing approximations (a 4D look-up table and a sixth-degree polynomial) trade accuracy for speed. The authors introduce sparse regression on an orthogonal Legendre polynomial basis to decompose the UTCI Offset into a Fourier-like Legendre expansion, yielding uncorrelated, hierarchically structured coefficients and improved numerical conditioning. They demonstrate substantial accuracy gains over the standard baseline, achieving C at degree 10 (and C at degree 16) with manageable parameter counts, and a dramatic reduction in large errors (e.g., C dropped from 8% to 4%); the coefficient decay follows a pattern, indicating effective sparse enrichment. The work delivers a Pareto front for model complexity versus accuracy, robust generalization under 20/80 training/testing splits and bootstrapping, and an accessible Python implementation with domain validity checks, offering a practical and interpretable update to UTCI computation for environmental modeling and forecasting systems.

Abstract

This article explores novel data-driven modeling approaches for analyzing and approximating the Universal Thermal Climate Index (UTCI), a physiologically-based metric integrating multiple atmospheric variables to assess thermal comfort. Given the nonlinear, multivariate structure of UTCI, we investigate symbolic and sparse regression techniques as tools for interpretable and efficient function approximation. In particular, we highlight the benefits of using orthogonal polynomial bases-such as Legendre polynomials-in sparse regression frameworks, demonstrating their advantages in stability, convergence, and hierarchical interpretability compared to standard polynomial expansions. We demonstrate that our models achieve significantly lower root-mean squared losses than the widely used sixth-degree polynomial benchmark-while using the same or fewer parameters. By leveraging Legendre polynomial bases, we construct models that efficiently populate a Pareto front of accuracy versus complexity and exhibit stable, hierarchical coefficient structures across varying model capacities. Training on just 20% of the data, our models generalize robustly to the remaining 80%, with consistent performance under bootstrapping. The decomposition effectively approximates the UTCI as a Fourier-like expansion in an orthogonal basis, yielding results near the theoretical optimum in the L2 (least squares) sense. We also connect these findings to the broader context of equation discovery in environmental modeling, referencing probabilistic grammar-based methods that enforce domain consistency and compactness in symbolic expressions. Taken together, these results illustrate how combining sparsity, orthogonality, and symbolic structure enables robust, interpretable modeling of complex environmental indices like UTCI - and significantly outperforms the state-of-the-art approximation in both accuracy and efficiency.

Paper Structure

This paper contains 5 sections, 1 equation, 5 figures, 3 tables.

Figures (5)

  • Figure 1: (a) 3D plot of UTCI Offset Brode2012 at 5% relative humidity, showing how wind speed ($\mathit{va}$), air temperature ($\mathit{Ta}$), and mean radiant temperature difference ($\mathit{Tr}-\mathit{Ta}$) combine to influence thermal stress. Color indicates the UTCI Offset magnitude across these environmental dimensions. (b) The different distributions of the water vapor pressure and relative humidity in the computed Offset dataset Brode2012. The water vapor pressure is strongly peaked at zero, while the relative humidity is uniform across its range.
  • Figure 2: The error of the standard polynomial UTCI approximation Brode2012 for relative humidity of 5%. (a) The difference between the standard UTCI approximation and the accurate values of the Offset function. (b) Histogram of the differences showing a normal distribution centered at zero.
  • Figure 3: Loss versus number of parameters for different polynomial degrees. The regularization parameter was varied in the lasso regression to yield a Pareto front in model accuracy and complexity for each degree.
  • Figure 4: Parameters (or polynomial coefficients) and how they change for different polynomial degrees for (a) simple regression and (b) sparse regression (using Legendre basis). (c) Sorted sparse‐regression coefficients (Legendre basis) versus parameter index on a logarithmic x–axis show a clear, Fourier–like decay with order—approximately $1/n$—that is stable across model capacities (degrees 4, 8, 12, 16), indicating a hierarchical structure where lower–order terms dominate and higher–order terms provide incremental refinement.
  • Figure 5: (a) Spatial distribution of the UTCI Offset error (approximation minus reference) for the new sparse-model-based approximation at a fixed relative humidity of 5%, showing small, smoothly varying discrepancies. (b) Comparison of error histograms for the standard UTCI approximation and the new approximation based on the tenth-degree Legendre polynomials. (c) Cumulative distributions of the absolute errors of the two approximations.