Table of Contents
Fetching ...

SEHFS: Structural Entropy-Guided High-Order Correlation Learning for Multi-View Multi-Label Feature Selection

Cheng Peng, Yonghao Li, Wanfu Gao, Jie Wen, Weiping Ding

TL;DR

The core idea of SEHFS is to convert the feature graph into a structural-entropy-minimizing encoding tree, quantifying the information cost of high-order dependencies and thus learning high-order feature correlations beyond pairwise correlations.

Abstract

In recent years, multi-view multi-label learning (MVML) has attracted extensive attention due to its close alignment to real-world scenarios. Information-theoretic methods have gained prominence for learning nonlinear correlations. However, two key challenges persist: first, features in real-world data commonly exhibit high-order structural correlations, but existing information-theoretic methods struggle to learn such correlations; second, commonly relying on heuristic optimization, information-theoretic methods are prone to converging to local optima. To address these two challenges, we propose a novel method called Structural Entropy Guided High-Order Correlation Learning for Multi-View Multi-Label Feature Selection (SEHFS). The core idea of SEHFS is to convert the feature graph into a structural-entropy-minimizing encoding tree, quantifying the information cost of high-order dependencies and thus learning high-order feature correlations beyond pairwise correlations. Specifically, features exhibiting strong high-order redundancy are grouped into a single cluster within the encoding tree, while inter-cluster feaeture correlations are minimized, thereby eliminating redundancy both within and across clusters. Furthermore, a new framework based on the fusion of information theory and matrix methods is adopted, which learns a shared semantic matrix and view-specific contribution matrices to reconstruct a global view matrix, thereby enhancing the information-theoretic method and balancing the global and local optimization. The ability of structural entropy to learn high-order correlations is theoretically established, and and both experiments on eight datasets from various domains and ablation studies demonstrate that SEHFS achieves superior performance in feature selection.

SEHFS: Structural Entropy-Guided High-Order Correlation Learning for Multi-View Multi-Label Feature Selection

TL;DR

The core idea of SEHFS is to convert the feature graph into a structural-entropy-minimizing encoding tree, quantifying the information cost of high-order dependencies and thus learning high-order feature correlations beyond pairwise correlations.

Abstract

In recent years, multi-view multi-label learning (MVML) has attracted extensive attention due to its close alignment to real-world scenarios. Information-theoretic methods have gained prominence for learning nonlinear correlations. However, two key challenges persist: first, features in real-world data commonly exhibit high-order structural correlations, but existing information-theoretic methods struggle to learn such correlations; second, commonly relying on heuristic optimization, information-theoretic methods are prone to converging to local optima. To address these two challenges, we propose a novel method called Structural Entropy Guided High-Order Correlation Learning for Multi-View Multi-Label Feature Selection (SEHFS). The core idea of SEHFS is to convert the feature graph into a structural-entropy-minimizing encoding tree, quantifying the information cost of high-order dependencies and thus learning high-order feature correlations beyond pairwise correlations. Specifically, features exhibiting strong high-order redundancy are grouped into a single cluster within the encoding tree, while inter-cluster feaeture correlations are minimized, thereby eliminating redundancy both within and across clusters. Furthermore, a new framework based on the fusion of information theory and matrix methods is adopted, which learns a shared semantic matrix and view-specific contribution matrices to reconstruct a global view matrix, thereby enhancing the information-theoretic method and balancing the global and local optimization. The ability of structural entropy to learn high-order correlations is theoretically established, and and both experiments on eight datasets from various domains and ablation studies demonstrate that SEHFS achieves superior performance in feature selection.
Paper Structure (28 sections, 19 equations, 9 figures, 6 tables, 1 algorithm)

This paper contains 28 sections, 19 equations, 9 figures, 6 tables, 1 algorithm.

Figures (9)

  • Figure 1: Concept maps of two type information-theoretic methods. Each ball denotes a feature, and greater color similarity indicates higher correlation. The yellow and red balls show high-order correlation. (a) is existing methods based on mutual information, which are limited to learning low-order correlations; (b) is SEHFS guided by structural entropy, which learns high-order correlations.
  • Figure 2: The framework of SEHFS. Given a multi-view multi-label dataset $[\mathbf{X}_1;\mathbf{X}_2;\dots;\mathbf{X}_V]$, SEHFS first learns the semantic matrix $\mathbf{S}_v$ and view-specific matrix $\mathbf{H}_v$ from the $v$-th view $\mathbf{X}_v$. Then, semantic matrices from all views are integrated to form a shared semantic matrix $\mathbf{S}$, whose graph structure is regularized by a graph Laplacian $\mathbf{L_Y}$ learned from the label matrix $\mathbf{Y}$. The global view matrix $\mathbf{X^f}$ is reconstructed from $\mathbf{S}$ and $\mathbf{H}_v$ ($v \in \{1;\cdots;V\}$), from which a global feature graph $G^f$ is derived. Minimizing the structural entropy of $G^f$ yields an optimal encoding tree $\mathcal{T}$ that learns higher-order correlations and removes redundancy. After mapping $\mathbf{Y}$ onto $\mathcal{T}$ produces a more effective, low-redundancy feature selection.
  • Figure 3: Structural-entropy–minimizing encoding tree in Scenario 1 and Scenario 2.
  • Figure 4: Subgraphs formed by nodes of the same color are high-densely connected, which is defined as "redundancy". By minimizing structural entropy, the feature graph $G^f$ is converted into an optimal encoding tree $\mathcal{T}$, which increases intra-layer edge weights while decreasing the weights from nodes to the root of their parent layer.
  • Figure 5: Performance comparison of SEHFS and seven baseline methods on SCENE (first row), VOC07 (second row), and MIRFlickr (third row). The horizontal axis denotes the number of selected features and the vertical axis reports AP, Cov, HL, and RL.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Definition 1: Encoding Tree
  • Definition 2: Structural Entropy