Table of Contents
Fetching ...

Network Cross-Validation and Model Selection via Subsampling

Sayan Chakrabarty, Srijan Sengupta, Yuguo Chen

TL;DR

Numerical results demonstrate that NETCROP performs accurate cross-validation on a diverse set of network model selection and parameter tuning problems and indicate that NETCROP is computationally much faster while being often more accurate than the existing methods for network cross-validation.

Abstract

Complex and larger networks are becoming increasingly prevalent in scientific applications in various domains. Although a number of models and methods exist for such networks, cross-validation on networks remains challenging due to the unique structure of network data. In this paper, we propose a general cross-validation procedure called NETCROP (NETwork CRoss-Validation using Overlapping Partitions). The key idea is to divide the original network into multiple subnetworks with a shared overlap part, producing training sets consisting of the subnetworks and a test set with the node pairs between the subnetworks. This train-test split provides the basis for a network cross-validation procedure that can be applied on a wide range of model selection and parameter tuning problems for networks. The method is computationally efficient for large networks as it uses smaller subnetworks for the training step. We provide methodological details and theoretical guarantees for several model selection and parameter tuning tasks using NETCROP. Numerical results demonstrate that NETCROP performs accurate cross-validation on a diverse set of network model selection and parameter tuning problems. The results also indicate that NETCROP is computationally much faster while being often more accurate than the existing methods for network cross-validation.

Network Cross-Validation and Model Selection via Subsampling

TL;DR

Numerical results demonstrate that NETCROP performs accurate cross-validation on a diverse set of network model selection and parameter tuning problems and indicate that NETCROP is computationally much faster while being often more accurate than the existing methods for network cross-validation.

Abstract

Complex and larger networks are becoming increasingly prevalent in scientific applications in various domains. Although a number of models and methods exist for such networks, cross-validation on networks remains challenging due to the unique structure of network data. In this paper, we propose a general cross-validation procedure called NETCROP (NETwork CRoss-Validation using Overlapping Partitions). The key idea is to divide the original network into multiple subnetworks with a shared overlap part, producing training sets consisting of the subnetworks and a test set with the node pairs between the subnetworks. This train-test split provides the basis for a network cross-validation procedure that can be applied on a wide range of model selection and parameter tuning problems for networks. The method is computationally efficient for large networks as it uses smaller subnetworks for the training step. We provide methodological details and theoretical guarantees for several model selection and parameter tuning tasks using NETCROP. Numerical results demonstrate that NETCROP performs accurate cross-validation on a diverse set of network model selection and parameter tuning problems. The results also indicate that NETCROP is computationally much faster while being often more accurate than the existing methods for network cross-validation.

Paper Structure

This paper contains 30 sections, 18 theorems, 143 equations, 5 figures, 4 tables, 8 algorithms.

Key Result

Theorem 1

If $A \sim SBM(n, K, B)$, then under the conditions in Assumption assump:SBM, the following bounds hold for NETCROP for SBM (Algorithm algo:SBM_DCBM) with the squared error loss as $n \to \infty$:

Figures (5)

  • Figure 1: Training subnetworks ($S_{01}$ and $S_{02}$) and testing set ($\mathbb S^c = S_1 \times S_2$) of NETCROP with overlap part $S_0$ and $s = 2$ non-overlap parts $S_1$ and $S_2$ ($s = 2$ subnetworks are used for illustration, while NETCROP can be used with $s \geq 2$).
  • Figure 2: Mean accuracy ($\%$) of regularized spectral clustering with multiple choices of $\tau$ and with $\hat{\tau}$ chosen from NETCROP with $R = 1$, and their mean and mode with $R = 5$.
  • Figure B3: Intersecting sets of $\mathcal{U}_{kk^\prime}^{(q)}$ and $\hat{\mathcal{U}}_{kk^\prime}^{(q)}$ for SBM
  • Figure B4: Intersecting sets of $\mathcal{U}_{kk^\prime}^{(q)}$ and $\hat{\mathcal{U}}_{kk^\prime}^{(q)}$ for DCBM
  • Figure C5: Accuracy, logarithm of mean runtime in seconds and mean RAM usage in MebiByte (MiB) of NETCROP, NCV and ECV against the network size for small networks. The networks in the top and bottom rows were generated from SBM and DCBM with $K = 3$ communities, respectively. NETCROP was applied with $R \in \{1, 3, 5\}$ repetitions, and NCV and ECV with $R = 1$ and $R = 20$ (stabilized).

Theorems & Definitions (27)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Theorem 6
  • Lemma B1
  • proof
  • Lemma B2
  • Theorem B1
  • ...and 17 more