Table of Contents
Fetching ...

GraphFM: Graph Factorization Machines for Feature Interaction Modeling

Shu Wu, Zekun Li, Yunyue Su, Zeyu Cui, Xiaoyu Zhang, Liang Wang

TL;DR

The proposed model, which integrates the interaction function of the FM into the feature aggregation strategy of the graph neural network (GNN), can model arbitrary-order feature interactions on graph-structured features by stacking layers.

Abstract

Factorization machine (FM) is a prevalent approach to modeling pairwise (second-order) feature interactions when dealing with high-dimensional sparse data. However, on the one hand, FM fails to capture higher-order feature interactions suffering from combinatorial expansion. On the other hand, taking into account interactions between every pair of features may introduce noise and degrade prediction accuracy. To solve the problems, we propose a novel approach, Graph Factorization Machine (GraphFM), by naturally representing features in the graph structure. In particular, we design a mechanism to select the beneficial feature interactions and formulate them as edges between features. Then the proposed model, which integrates the interaction function of FM into the feature aggregation strategy of Graph Neural Network (GNN), can model arbitrary-order feature interactions on the graph-structured features by stacking layers. Experimental results on several real-world datasets have demonstrated the rationality and effectiveness of our proposed approach. The code and data are available at https://github.com/CRIPAC-DIG/GraphCTR}{https://github.com/CRIPAC-DIG/GraphCTR

GraphFM: Graph Factorization Machines for Feature Interaction Modeling

TL;DR

The proposed model, which integrates the interaction function of the FM into the feature aggregation strategy of the graph neural network (GNN), can model arbitrary-order feature interactions on graph-structured features by stacking layers.

Abstract

Factorization machine (FM) is a prevalent approach to modeling pairwise (second-order) feature interactions when dealing with high-dimensional sparse data. However, on the one hand, FM fails to capture higher-order feature interactions suffering from combinatorial expansion. On the other hand, taking into account interactions between every pair of features may introduce noise and degrade prediction accuracy. To solve the problems, we propose a novel approach, Graph Factorization Machine (GraphFM), by naturally representing features in the graph structure. In particular, we design a mechanism to select the beneficial feature interactions and formulate them as edges between features. Then the proposed model, which integrates the interaction function of FM into the feature aggregation strategy of Graph Neural Network (GNN), can model arbitrary-order feature interactions on the graph-structured features by stacking layers. Experimental results on several real-world datasets have demonstrated the rationality and effectiveness of our proposed approach. The code and data are available at https://github.com/CRIPAC-DIG/GraphCTR}{https://github.com/CRIPAC-DIG/GraphCTR

Paper Structure

This paper contains 25 sections, 18 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The overview of GraphFM. The input features are modeled as a graph, where nodes are feature fields, and edges are interactions. At each layer of GraphFM, the edges (beneficial interactions) are first selected by the interaction selection component. Then these selected feature interactions are aggregated via the attention net to update feature embeddings in the interaction aggregation component. The learned feature embeddings at every layer are used for the final prediction jointly.
  • Figure 2: The performance comparison of GraphFM with different components on Criteo and MovieLens-1M datasets. Further analysis is provided in Section \ref{['sect:ablation']}.
  • Figure 3: Model performance with respect to the size of the sampled neighborhood, where the "neighborhood sample size" refers to the number of neighbors sampled at each depth for $K = 3$ with $m_1 = n$, and $m_2, m_3$ with varying values.
  • Figure 4: Heat maps of estimated edge weights of two correctly predicted instances (a, b) and one wrongly predicted instance (c) on MovieLens-1M dataset, where positive edge weights indicate beneficial feature interactions. The axises represent feature fields (Gender, Age, Occupation, Zipcode, ReleaseTime, WatchTime, Genre).