Table of Contents
Fetching ...

Nonlinear Modal Interval Regression for Bivariate Data Analysis

Sai Yao, Yuko Araki, Osuke Iwata

Abstract

The dispersion of real data is particularly important to understand the variability of a given distribution. In addition to the central tendency, variability is of considerable interest in a wide variety of fields such as life sciences, meteorology, and economics. The modal interval (MI) describes the dispersion or spread of distribution and represents the most concentrated interval of a univariate unimodal distribution. In this study, we propose a nonlinear modal interval regression (MIR) method to smoothly estimate a conditional MI to provide a robust description of how the dispersion of a data distribution varies with the covariate. First, we use kernel density estimation (KDE) to estimate the quantile levels corresponding to the conditional MI bounds, which serve as input to the quantile loss function. Second, we fit upper and lower bound functions using the quantile loss with smoothing splines. The results of numerical experiments demonstrate that the reformulated MIR achieved higher accuracy and stability than both the conventional MIR and the KDE methods. To evaluate the effectiveness of the proposed approach, we applied the method to neonatal hormone data and identified notable rhythms in cortisol and melatonin levels during the first ten days after birth.

Nonlinear Modal Interval Regression for Bivariate Data Analysis

Abstract

The dispersion of real data is particularly important to understand the variability of a given distribution. In addition to the central tendency, variability is of considerable interest in a wide variety of fields such as life sciences, meteorology, and economics. The modal interval (MI) describes the dispersion or spread of distribution and represents the most concentrated interval of a univariate unimodal distribution. In this study, we propose a nonlinear modal interval regression (MIR) method to smoothly estimate a conditional MI to provide a robust description of how the dispersion of a data distribution varies with the covariate. First, we use kernel density estimation (KDE) to estimate the quantile levels corresponding to the conditional MI bounds, which serve as input to the quantile loss function. Second, we fit upper and lower bound functions using the quantile loss with smoothing splines. The results of numerical experiments demonstrate that the reformulated MIR achieved higher accuracy and stability than both the conventional MIR and the KDE methods. To evaluate the effectiveness of the proposed approach, we applied the method to neonatal hormone data and identified notable rhythms in cortisol and melatonin levels during the first ten days after birth.
Paper Structure (31 sections, 34 equations, 8 figures, 6 tables)

This paper contains 31 sections, 34 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Daily precipitation ($\geq$0.1 mm) in Sendai, Japan (July 2014--July 2024). (a) The blue solid line represents the mean, the orange dashed line represents the median, and the red dotted line represents the mode. (b) The blue interval represents the range between the first ($Q_1$) and third ($Q_3$) quartiles, the widths of which correspond to the IQR, and the red interval represents the 50% MI.
  • Figure 2: Logical flow of the proposed MIR framework from definition and assumptions to methodological framework.
  • Figure 3: Penalty multiplier in the mCWC criterion as a function of $\alpha - \mathrm{MICP}$. When $\mathrm{MICP}\geq \alpha$, the multiplier equals 1 (no penalty). When $\mathrm{MICP}< \alpha$, the multiplier increases as $\exp[-\eta(\mathrm{MICP}-\alpha)]$ for $\eta\in\{5,10,20,40\}$, yielding larger penalties as coverage falls further below the target. The dashed vertical line marks $\mathrm{MICP}= \alpha$.
  • Figure 4: Visualization of the simulated data under (a) Distribution 1 and (b) Distribution 2. The band regions represent the true 50% conditional MI theoretically derived from the known distributions, and the blue points denote 1,000 randomly generated data points from each distribution.
  • Figure 5: RMSE versus sample size for the three methods under (a) Distribution 1 and (b) Distribution 2. The lines show the mean RMSE over 50 repetitions.
  • ...and 3 more figures