Table of Contents
Fetching ...

Discrete Chi-Square Method beats Discrete Fourier Transform and other similar parametric time series analysis methods

Lauri Jetsu

TL;DR

The paper presents Discrete Chi-square Method (DCM) as a well-posed computational approach to nonlinear time-series analysis with unknown trends and multiple signals, contrasting it with the traditional DFT. By recasting g(t) as a sum of a harmonic component and an unknown trend, and by fixing frequencies to render the rest linear, DCM performs a massive yet structured set of linear LS fits and uses Fisher-test and Predictivity-test to select the best model. Across seven simulated data sets that challenge DFT (short windows, trends, near-equal or non-sinusoidal signals, uneven spacing), DCM reliably detects the true signals and trends while DFT often fails or produces biased periods and unstable residuals. The work demonstrates that, with sufficient data size and accuracy, DCM can forecast and postdict phenomena, providing a practical framework for complex time-series analysis beyond traditional spectral methods. It highlights the method’s potential for broad application in forecasting, non-stationary contexts, and unevenly sampled data, while acknowledging computational demands and the need for careful model selection and validation.

Abstract

We compare two time series analysis methods, the Discrete Fourier Transform (DFT) and our Discrete Chi-square Method (DCM). DCM is designed for detecting signal(-s) superimposed on an unknown trend. The analytical solution for the non-linear DCM model is an ill-posed problem. We present a computational statistical well-posed solution for this problem. The backbone of DCM is the Gauss-Markov theorem that the least squares fit is the best unbiased estimator for linear regression models. DCM can not fail because this simple time series analysis method computes a massive number of linear least squares fits. Hence, the data spacing, even or uneven, is irrelevant. We use the Fisher-test to identify the best DCM model from all alternative tested DCM models. This best model must also pass our Predictivity-test. Our analyses of simulated complex data sets expose the weaknesses of DFT and the efficiency of DCM. The DFT and DCM are frequency-domain parametric methods. There are many other similar parametric methods. Just like DFT, those other methods suffer from their own particular application limitations. The list of those limitations is long. DCM suffers from none of those limitations. The performance of DCM depends only on the data set size and accuracy. DCM is an ideal forecasting method because the data set time span $(ΔT)$ is irrelevant. It does not matter how long $(ΔT)$ and/or complex the data set is because DCM will inevitably detect the signal(-s) and the trend when the data set size $(n)$ and/or accuracy $(σ)$ become adequate.

Discrete Chi-Square Method beats Discrete Fourier Transform and other similar parametric time series analysis methods

TL;DR

The paper presents Discrete Chi-square Method (DCM) as a well-posed computational approach to nonlinear time-series analysis with unknown trends and multiple signals, contrasting it with the traditional DFT. By recasting g(t) as a sum of a harmonic component and an unknown trend, and by fixing frequencies to render the rest linear, DCM performs a massive yet structured set of linear LS fits and uses Fisher-test and Predictivity-test to select the best model. Across seven simulated data sets that challenge DFT (short windows, trends, near-equal or non-sinusoidal signals, uneven spacing), DCM reliably detects the true signals and trends while DFT often fails or produces biased periods and unstable residuals. The work demonstrates that, with sufficient data size and accuracy, DCM can forecast and postdict phenomena, providing a practical framework for complex time-series analysis beyond traditional spectral methods. It highlights the method’s potential for broad application in forecasting, non-stationary contexts, and unevenly sampled data, while acknowledging computational demands and the need for careful model selection and validation.

Abstract

We compare two time series analysis methods, the Discrete Fourier Transform (DFT) and our Discrete Chi-square Method (DCM). DCM is designed for detecting signal(-s) superimposed on an unknown trend. The analytical solution for the non-linear DCM model is an ill-posed problem. We present a computational statistical well-posed solution for this problem. The backbone of DCM is the Gauss-Markov theorem that the least squares fit is the best unbiased estimator for linear regression models. DCM can not fail because this simple time series analysis method computes a massive number of linear least squares fits. Hence, the data spacing, even or uneven, is irrelevant. We use the Fisher-test to identify the best DCM model from all alternative tested DCM models. This best model must also pass our Predictivity-test. Our analyses of simulated complex data sets expose the weaknesses of DFT and the efficiency of DCM. The DFT and DCM are frequency-domain parametric methods. There are many other similar parametric methods. Just like DFT, those other methods suffer from their own particular application limitations. The list of those limitations is long. DCM suffers from none of those limitations. The performance of DCM depends only on the data set size and accuracy. DCM is an ideal forecasting method because the data set time span is irrelevant. It does not matter how long and/or complex the data set is because DCM will inevitably detect the signal(-s) and the trend when the data set size and/or accuracy become adequate.

Paper Structure

This paper contains 21 sections, 45 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Model 1 (Table \ref{['TableModelOne']}: $n=50$, ${\mathrm{SN}}=50$ simulation). (a) DCM long search periodogram $z_1(f_1)$ gives best period at 1.843 (diamond). (b) DCM short search periodogram $z_1(f_1)$ gives best period at 1.822 (diamond). (c) DCM model $g(t)$ (black continuous line), DCM trend $p(t)$ (black dashed line) and data $y_i$ (black dots). (d) DCM model detrended $g(t)-p(t)$ (black continuous line), DCM signal $h_1(t)$ (red thick continuous line), detrended data $y(t_i)-p(t_i)$ (black dots) and DCM model residuals $y(t_i)-g(t_1)$ (blue dots) offset to -0.65 level (blue dotted line). (e) DFT periodogram $z_{\mathrm{DFT}}(f)$ gives best period at 1.190 (Diamond). (f) DFT model $g_{\mathrm{DFT}}(t)$ (black continuous line), DFT trend $p_{\mathrm{DFT}}(t)$ (black dashed line) and data $y_i$ (black dots). (g) DFT model detrended $g_{\mathrm{DFT}}(t)-p_{\mathrm{DFT}}(t)$ (black continuous line), DFT pure sine $s_{\mathrm{DFT}}(t)$ (red thick continuous line), detrended data $y(t_i)-p_{\mathrm{DFT}}(t_i)$ (black dots) and DFT model residuals (blue dots) offset to -1.5 level (blue dotted line).
  • Figure 2: Model 2 (Table \ref{['TableModelTwo']}: $n=10~000$, ${\mathrm{SN}}=100$ simulation). (c) Colour of $g(t)$ line has been changed from black to white. (d) Colour of $g(t)$ line has been changed from black to white. Colour of offset level -0.3 dotted line has been changed from blue to white. Locations of best periods (diamonds) are explained in text (Section \ref{['SectModelTwo']}). Otherwise, notations are as in Figure \ref{['FigModelOne']}.
  • Figure 3: Model 3 (Table \ref{['TableModelThree']}: $n=50$, ${\mathrm{SN}}=50$ simulation). (a) DCM long search periodograms $z_1(f_1)$ (red) and $z_2(f)$ (blue) give best periods at 0.160 and 0.168 (diamonds). (b) DCM short search periodograms $z_1(f_1)$ (red) and $z_2(f)$ (blue) give best periods at 0.160 and 0.170 (diamonds). (c) DCM model $g(t)$ (black continuous line), DCM trend $p(t)$ (black dashed line) and data $y_i$ (black dots). (d) DCM model detrended $g(t)-p(t)$ (black continuous line), DCM signal $h_1(t)$ (red thick continuous line), DCM signal $h_2(t)$ (blue continuous thin line), detrended data $y(t_i)-p(t_i)$ (black dots) and DCM model residuals $y(t_i)-g(t_1)$ (blue dots) offset to -3.0 level (blue dotted line) (e) DFT periodogram $z_{\mathrm{DFT}}(f)$ for the original data gives best period at 0.163 (diamond). (f) DFT periodogram $z_{\mathrm{DFT}}(f)$ for the sine model residuals gives best period at 0.182 (diamond). (g) DFT model $g_{\mathrm{DFT}}(t)$ (black continuous line), DFT trend $p_{\mathrm{DFT}}(t)$ (black dashed line) and data $y_i$ (black dots). (h) DFT model detrended $g_{\mathrm{DFT}}(t)-p_{\mathrm{DFT}}(t)$ (black continuous line), DFT pure sine model for original data $s_{y,\mathrm{DFT}}(t)$ (red thick continuous line), DFT pure sine model for first residuals $s_{\epsilon,\mathrm{DFT}}(t)$ (blue continuous thin line), detrended data $y(t_i)-p_{\mathrm{DFT}}(t_i)$ (black dots) and DFT model residuals (blue dots) offset to -3.0 level (blue dotted line).
  • Figure 4: Model 4 (Table \ref{['TableModelFour']}: $n=50$, ${\mathrm{SN}}=50$ simulation). Notations as in Figure \ref{['FigModelThree']}. Best periods (diamonds) are explained in Section \ref{['SectModelFour']}.
  • Figure 5: Model 5 (Table \ref{['TableModelFive']}: $n=10~000$, ${\mathrm{SN}}=10~000$ simulation). Notations as in Figure \ref{['FigModelThree']}. Best periods (diamonds) are explained in Section \ref{['SectModelFive']}.
  • ...and 3 more figures