Table of Contents
Fetching ...

Model Selection Through Model Sorting

Mohammad Ali Hajiani, Babak Seyfe

TL;DR

It is shown that, the S-NER method without any prior information can outperform the accuracy of feature sorting algorithms like orthogonal matching pursuit (OMP) that aided with prior knowledge of the true model order.

Abstract

We propose a novel approach to select the best model of the data. Based on the exclusive properties of the nested models, we find the most parsimonious model containing the risk minimizer predictor. We prove the existence of probable approximately correct (PAC) bounds on the difference of the minimum empirical risk of two successive nested models, called successive empirical excess risk (SEER). Based on these bounds, we propose a model order selection method called nested empirical risk (NER). By the sorted NER (S-NER) method to sort the models intelligently, the minimum risk decreases. We construct a test that predicts whether expanding the model decreases the minimum risk or not. With a high probability, the NER and S-NER choose the true model order and the most parsimonious model containing the risk minimizer predictor, respectively. We use S-NER model selection in the linear regression and show that, the S-NER method without any prior information can outperform the accuracy of feature sorting algorithms like orthogonal matching pursuit (OMP) that aided with prior knowledge of the true model order. Also, in the UCR data set, the NER method reduces the complexity of the classification of UCR datasets dramatically, with a negligible loss of accuracy.

Model Selection Through Model Sorting

TL;DR

It is shown that, the S-NER method without any prior information can outperform the accuracy of feature sorting algorithms like orthogonal matching pursuit (OMP) that aided with prior knowledge of the true model order.

Abstract

We propose a novel approach to select the best model of the data. Based on the exclusive properties of the nested models, we find the most parsimonious model containing the risk minimizer predictor. We prove the existence of probable approximately correct (PAC) bounds on the difference of the minimum empirical risk of two successive nested models, called successive empirical excess risk (SEER). Based on these bounds, we propose a model order selection method called nested empirical risk (NER). By the sorted NER (S-NER) method to sort the models intelligently, the minimum risk decreases. We construct a test that predicts whether expanding the model decreases the minimum risk or not. With a high probability, the NER and S-NER choose the true model order and the most parsimonious model containing the risk minimizer predictor, respectively. We use S-NER model selection in the linear regression and show that, the S-NER method without any prior information can outperform the accuracy of feature sorting algorithms like orthogonal matching pursuit (OMP) that aided with prior knowledge of the true model order. Also, in the UCR data set, the NER method reduces the complexity of the classification of UCR datasets dramatically, with a negligible loss of accuracy.
Paper Structure (20 sections, 14 theorems, 96 equations, 4 figures, 1 algorithm)

This paper contains 20 sections, 14 theorems, 96 equations, 4 figures, 1 algorithm.

Key Result

Corollary 1

Let $\{\bar{\mathcal{M}}_k\}_{k=1}^L$ be an arbitrary set of models. Then, let $\mathcal{M}_1=\bar{\mathcal{M}}_1$ and for every $k\in\{1,2,...,L-1\}$, $\mathcal{M}_{k+1}=\mathcal{M}_{k} \cup \bar{\mathcal{M}}_{k+1}$. The model family $\{\mathcal{M}_k\}_{k=1}^L$ is sequentially nested.

Figures (4)

  • Figure 1: Example of parameter spaces in the S-NER model selection procedure. Dash lines refer to the candidates’ models, and continuous lines refer to the models with the least minimum empirical risk between candidates’ models.
  • Figure 2: Comparison of true detection probability of the aided S-NER, S-NER model selection method, aided OMP, aided LARS, and EFIC and EBICR methods using OMP and LARS as the feature sorting algorithm for different SNR. In this simulation, the number of observations $n=60$, the number of features $L=205$, and the order of the true model is $K=5$.
  • Figure 3: Comparison of true detection Probability of the aided S-NER, the S-NER, aided OMP, aided LARS, and EFIC and EBICR methods using OMP and LARS as the feature sorting algorithm for the different number of observations $n$. In this simulation, SNR is $6$ dB, the number of features $L=\lceil n^{1.3}\rceil$, and the order of the true model $K=5$.
  • Figure 4: Accuracy and scaled kernels number to 9996 of mini-ROCKET versus NER, EFIC, AIC, BIC, and EBICR feature selection in the UCR dataset.

Theorems & Definitions (38)

  • Definition 1: Nested Models
  • Definition 2: Partially Nested Models
  • Definition 3: Non-Nested Models
  • Definition 4: Sequentially Nested Model Class
  • Corollary 1: Nesting Process
  • proof
  • Corollary 2
  • proof
  • Definition 5: Glivenko-Cantelli Function Class wainwright2019high
  • Lemma 1
  • ...and 28 more