Table of Contents
Fetching ...

On estimation and order selection for multivariate extremes via clustering

Shiyuan Deng, He Tang, Shuyang Bai

TL;DR

This work introduces an extra penalty term to the well-known simplified average silhouette width, which penalizes small cluster sizes and small dissimilarities between cluster centers, and provides a consistent method for determining the order of a max-linear factor model, where a typical information-based approach is not viable.

Abstract

We investigate the estimation of multivariate extreme models with a discrete spectral measure using spherical clustering techniques. The primary contribution involves devising a method for selecting the order, that is, the number of clusters. The method consistently identifies the true order, i.e., the number of spectral atoms, and enjoys intuitive implementation in practice. Specifically, we introduce an extra penalty term to the well-known simplified average silhouette width, which penalizes small cluster sizes and small dissimilarities between cluster centers. Consequently, we provide a consistent method for determining the order of a max-linear factor model, where a typical information-based approach is not viable. Our second contribution is a large-deviation-type analysis for estimating the discrete spectral measure through clustering methods, which serves as an assessment of the convergence quality of clustering-based estimation for multivariate extremes. Additionally, as a third contribution, we discuss how estimating the discrete measure can lead to parameter estimations of heavy-tailed factor models. We also present simulations and real-data studies that demonstrate order selection and factor model estimation.

On estimation and order selection for multivariate extremes via clustering

TL;DR

This work introduces an extra penalty term to the well-known simplified average silhouette width, which penalizes small cluster sizes and small dissimilarities between cluster centers, and provides a consistent method for determining the order of a max-linear factor model, where a typical information-based approach is not viable.

Abstract

We investigate the estimation of multivariate extreme models with a discrete spectral measure using spherical clustering techniques. The primary contribution involves devising a method for selecting the order, that is, the number of clusters. The method consistently identifies the true order, i.e., the number of spectral atoms, and enjoys intuitive implementation in practice. Specifically, we introduce an extra penalty term to the well-known simplified average silhouette width, which penalizes small cluster sizes and small dissimilarities between cluster centers. Consequently, we provide a consistent method for determining the order of a max-linear factor model, where a typical information-based approach is not viable. Our second contribution is a large-deviation-type analysis for estimating the discrete spectral measure through clustering methods, which serves as an assessment of the convergence quality of clustering-based estimation for multivariate extremes. Additionally, as a third contribution, we discuss how estimating the discrete measure can lead to parameter estimations of heavy-tailed factor models. We also present simulations and real-data studies that demonstrate order selection and factor model estimation.
Paper Structure (29 sections, 12 theorems, 87 equations, 50 figures, 1 table)

This paper contains 29 sections, 12 theorems, 87 equations, 50 figures, 1 table.

Key Result

Proposition 1

Suppose $\mathbf{X}$ satisfies conditions eq:equiv tail and eq:Lambda with a spectral measure $H$ on $\mathbb{S}_+^{d-1}$ as defined in eq:Lambda polar. Let $W_n$ denote the extremal subsample as in eq:W_n and $H_n$ denote the empirical spectral measure as in eq:emp spec. Then for any $S$ that is a

Figures (50)

  • Figure 1: A simulation instance taken from Section \ref{['sec:sim']}$d=6$, $k=6$ setup. Penalized Average Silhouette Width (ASW) $S_t$ (vertical axis) for spherical $k$-means clustering is plotted as a function of test order $m$ (horizontal axis). The different penalty values of $t$ are illustrated by different colors. The true discrete spectral measure in \ref{['eq:disc spec']} is given by $(\mathbf{a}_1, p_1 )=((0.29, 0.21, 0.50, 0.45, 0.43, 0.49)^\top, 0.22)$, $(\mathbf{a}_2, p_2 ) =((0.74, 0.00, 0.59, 0.00, 0.32, 0.00)^\top, 0.10)$, $(\mathbf{a}_3, p_3 ) =((0.00, 0.27, 0.00, 0.47, 0.00, 0.84)^\top, 0.13)$, $(\mathbf{a}_4, p_4 )= ((0.33, 0.70, 0.63, 0.00, 0.00, 0.00)^\top,0.14 )$, $(\mathbf{a}_5, p_5 ) =((0.00, 0.00, 0.00, 0.81, 0.47, 0.34)^\top, 0.09)$, $(\mathbf{a}_6, p_6 ) =((0.48, 0.49, 0.25, 0.33, 0.53, 0.29)^\top, 0.32)$.
  • Figure 2: Simulation result visualization for the setup $d=4,k=2$ in Section \ref{['sec:sim']}. A column corresponds to a simulated dataset, and a row corresponds to a $t$ penalty parameter specification. See Section \ref{['sec:sim']} for more details.
  • Figure 3: Simulation result visualization for the setup $d=4,k=6$ in Section \ref{['sec:sim']}. A column corresponds to a simulated dataset, and a row corresponds to a $t$ penalty parameter specification. See Section \ref{['sec:sim']} for more details.
  • Figure 4: Simulation result visualization for the setup $d=6,k=6$ in Section \ref{['sec:sim']}. A column corresponds to a simulated dataset, and a row corresponds to a $t$ penalty parameter specification. See Section \ref{['sec:sim']} for more details.
  • Figure 5: Simulation result visualization for the setup $d=10,k=6$ in Section \ref{['sec:sim']}. A column corresponds to a simulated dataset, and a row corresponds to a $t$ penalty parameter specification. See Section \ref{['sec:sim']} for more details.
  • ...and 45 more figures

Theorems & Definitions (32)

  • Definition 1
  • Remark 1
  • Definition 2
  • Remark 2
  • Proposition 1
  • proof : Proof:
  • Corollary 1
  • proof : Proof:
  • Remark 3
  • Theorem 1
  • ...and 22 more