Table of Contents
Fetching ...

Combining LLM Semantic Reasoning with GNN Structural Modeling for Multi-View Multi-Label Feature Selection

Zhiqi Chen, Yuzhou Liu, Jiarui Liu, Wanfu Gao

TL;DR

This paper tackles multi-view multi-label feature selection by integrating semantic priors derived from Large Language Models with graph-based structural modeling. It introduces a semantic-aware two-level heterogeneous graph that fuses LLM-derived semantic edges with statistical edges computed from mutual information and label co-occurrence, and it uses a type-aware Graph Attention Network to learn feature embeddings and saliency scores. Empirical results on six benchmark datasets show consistent improvements over state-of-the-art baselines and demonstrate robustness on small-scale datasets, highlighting the value of combining semantic and statistical information. The approach offers practical gains for high-dimensional, multimodal data and opens avenues for end-to-end LLM feedback and self-supervised pretraining on heterogeneous graphs.

Abstract

Multi-view multi-label feature selection aims to identify informative features from heterogeneous views, where each sample is associated with multiple interdependent labels. This problem is particularly important in machine learning involving high-dimensional, multimodal data such as social media, bioinformatics or recommendation systems. Existing Multi-View Multi-Label Feature Selection (MVMLFS) methods mainly focus on analyzing statistical information of data, but seldom consider semantic information. In this paper, we aim to use these two types of information jointly and propose a method that combines Large Language Models (LLMs) semantic reasoning with Graph Neural Networks (GNNs) structural modeling for MVMLFS. Specifically, the method consists of three main components. (1) LLM is first used as an evaluation agent to assess the latent semantic relevance among feature, view, and label descriptions. (2) A semantic-aware heterogeneous graph with two levels is designed to represent relations among features, views and labels: one is a semantic graph representing semantic relations, and the other is a statistical graph. (3) A lightweight Graph Attention Network (GAT) is applied to learn node embedding in the heterogeneous graph as feature saliency scores for ranking and selection. Experimental results on multiple benchmark datasets demonstrate the superiority of our method over state-of-the-art baselines, and it is still effective when applied to small-scale datasets, showcasing its robustness, flexibility, and generalization ability.

Combining LLM Semantic Reasoning with GNN Structural Modeling for Multi-View Multi-Label Feature Selection

TL;DR

This paper tackles multi-view multi-label feature selection by integrating semantic priors derived from Large Language Models with graph-based structural modeling. It introduces a semantic-aware two-level heterogeneous graph that fuses LLM-derived semantic edges with statistical edges computed from mutual information and label co-occurrence, and it uses a type-aware Graph Attention Network to learn feature embeddings and saliency scores. Empirical results on six benchmark datasets show consistent improvements over state-of-the-art baselines and demonstrate robustness on small-scale datasets, highlighting the value of combining semantic and statistical information. The approach offers practical gains for high-dimensional, multimodal data and opens avenues for end-to-end LLM feedback and self-supervised pretraining on heterogeneous graphs.

Abstract

Multi-view multi-label feature selection aims to identify informative features from heterogeneous views, where each sample is associated with multiple interdependent labels. This problem is particularly important in machine learning involving high-dimensional, multimodal data such as social media, bioinformatics or recommendation systems. Existing Multi-View Multi-Label Feature Selection (MVMLFS) methods mainly focus on analyzing statistical information of data, but seldom consider semantic information. In this paper, we aim to use these two types of information jointly and propose a method that combines Large Language Models (LLMs) semantic reasoning with Graph Neural Networks (GNNs) structural modeling for MVMLFS. Specifically, the method consists of three main components. (1) LLM is first used as an evaluation agent to assess the latent semantic relevance among feature, view, and label descriptions. (2) A semantic-aware heterogeneous graph with two levels is designed to represent relations among features, views and labels: one is a semantic graph representing semantic relations, and the other is a statistical graph. (3) A lightweight Graph Attention Network (GAT) is applied to learn node embedding in the heterogeneous graph as feature saliency scores for ranking and selection. Experimental results on multiple benchmark datasets demonstrate the superiority of our method over state-of-the-art baselines, and it is still effective when applied to small-scale datasets, showcasing its robustness, flexibility, and generalization ability.

Paper Structure

This paper contains 25 sections, 13 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: An example of semantic relations in MVMLFS. In text view, feature “puppy” has a similar meaning to the label “dog”, which can indicate it is an important feature for “dog”. In image view, since labels “dog” and “cat” have high semantic similarity, feature “eye” (which is informative for recognizing “dog”) may also be informative for “cat”.
  • Figure 2: Framework of our method. It consists of three main steps: (1) establishing semantic relations among features, views, and labels by LLM; (2) combining semantic and statistical information, a two-level heterogeneous graph is constructed as the structural modeling of the dataset; (3) employing Graph Attention Network to learn graph embeddings, enabling the estimation of importance scores for feature nodes.
  • Figure 3: Seven methods on SCENE in terms of Average Precision, Macro-AUC, Label Ranking Average Precision, and Hamming loss.
  • Figure 4: Seven methods on Yeast in terms of Average Precision, Macro-AUC, Label Ranking Average Precision, and Hamming loss.
  • Figure 5: Ablation experimental results in terms of LRAP on six datasets with different settings of the method.