Table of Contents
Fetching ...

Canonical-Correlation-Based Fast Feature Selection for Structural Health Monitoring

Sikai Zhang, Tingna Wang, Keith Worden, Limin Sun, Elizabeth J. Cross

Abstract

Feature selection refers to the process of selecting useful features for machine learning tasks, and it is also a key step for structural health monitoring (SHM). This paper proposes a fast feature selection algorithm by efficiently computing the sum of squared canonical correlation coefficients between monitored features and target variables of interest in greedy search. The proposed algorithm is applied to both synthetic and real datasets to illustrate its advantages in terms of computational speed, general classification and regression tasks, as well as damage-sensitive feature selection tasks. Furthermore, the performance of the proposed algorithm is evaluated under varying environmental conditions and on an edge computing device to investigate its applicability in real-world SHM scenarios. The results show that the proposed algorithm can successfully select useful features with extraordinarily fast computational speed, which implies that the proposed algorithm has great potential where features need to be selected and updated online frequently, or where devices have limited computing capability.

Canonical-Correlation-Based Fast Feature Selection for Structural Health Monitoring

Abstract

Feature selection refers to the process of selecting useful features for machine learning tasks, and it is also a key step for structural health monitoring (SHM). This paper proposes a fast feature selection algorithm by efficiently computing the sum of squared canonical correlation coefficients between monitored features and target variables of interest in greedy search. The proposed algorithm is applied to both synthetic and real datasets to illustrate its advantages in terms of computational speed, general classification and regression tasks, as well as damage-sensitive feature selection tasks. Furthermore, the performance of the proposed algorithm is evaluated under varying environmental conditions and on an edge computing device to investigate its applicability in real-world SHM scenarios. The results show that the proposed algorithm can successfully select useful features with extraordinarily fast computational speed, which implies that the proposed algorithm has great potential where features need to be selected and updated online frequently, or where devices have limited computing capability.

Paper Structure

This paper contains 18 sections, 6 theorems, 45 equations, 10 figures, 4 tables, 1 algorithm.

Key Result

Proposition 1

If $[\mathbf{X}]_{\mathbf{U}}$ and $[\mathbf{Y}]_{\mathbf{U}}$ are the coordinate matrices of $\mathbf{X}$ and $\mathbf{Y}$ with respect to the same matrix $\mathbf{U}$, then, where $\augm{\cdot}{\cdot}$ denotes an augmented matrix.

Figures (10)

  • Figure 1: The elapsed time of the definition-, $h$-correlation- and $\eta$-cosine-based feature selection methods.
  • Figure 2: Comparison of the averaged ACC results between different feature ranking criteria in classification tasks.
  • Figure 3: Comparison of the averaged R-squared results between different feature ranking criteria in regression tasks.
  • Figure 4: The schematic of the starboard wing and nine inspection panels.
  • Figure 5: Comparison between the proposed SSC-based algorithm and the eight feature selection methods in scikit-learn on classification accuracy and computational speed, where SSC(o) represents the SSC-based algorithm with ordinal encoding and SSC(d) represents the SSC-based algorithm with dummy encoding.
  • ...and 5 more figures

Theorems & Definitions (10)

  • Definition 1
  • Proposition 1
  • Theorem 1: Correlation Superposition Theorem
  • Corollary 1: Maximum Correlation Theorem
  • Lemma 1
  • Theorem 2: Cosine Superposition Theorem
  • Corollary 2: Maximum Cosine Theorem
  • proof
  • proof
  • proof