Graph-enhanced Optimizers for Structure-aware Recommendation Embedding Evolution

Cong Xu; Jun Wang; Jianyong Wang; Wei Zhang

Graph-enhanced Optimizers for Structure-aware Recommendation Embedding Evolution

Cong Xu, Jun Wang, Jianyong Wang, Wei Zhang

TL;DR

A novel embedding update mechanism, Structure-aware Embedding Evolution (SEvo for short), to encourage related nodes to evolve similarly at each step and is able to directly inject graph structural information into embedding with minimal computational overhead during training.

Abstract

Embedding plays a key role in modern recommender systems because they are virtual representations of real-world entities and the foundation for subsequent decision-making models. In this paper, we propose a novel embedding update mechanism, Structure-aware Embedding Evolution (SEvo for short), to encourage related nodes to evolve similarly at each step. Unlike GNN (Graph Neural Network) that typically serves as an intermediate module, SEvo is able to directly inject graph structural information into embedding with minimal computational overhead during training. The convergence properties of SEvo along with its potential variants are theoretically analyzed to justify the validity of the designs. Moreover, SEvo can be seamlessly integrated into existing optimizers for state-of-the-art performance. Particularly SEvo-enhanced AdamW with moment estimate correction demonstrates consistent improvements across a spectrum of models and datasets, suggesting a novel technical route to effectively utilize graph structural information beyond explicit GNN modules.

Graph-enhanced Optimizers for Structure-aware Recommendation Embedding Evolution

TL;DR

Abstract

Paper Structure (33 sections, 17 theorems, 84 equations, 7 figures, 8 tables, 4 algorithms)

This paper contains 33 sections, 17 theorems, 84 equations, 7 figures, 8 tables, 4 algorithms.

Introduction
Structure-aware Embedding Evolution
Preliminaries
Methodology
Convergence Analysis for Further Modification
Integrating SEvo into Existing Optimizers
Experiments
Overall Comparison
Empirical Analysis
Ablation Study
Applications of SEvo Beyond Interaction Data
Related Work
Conclusion and Future Work
Proofs
Proof of Theorem \ref{['theorem-smh-cvg']}
...and 18 more sections

Key Result

Theorem 1

The iterative approximation is direction-aware for all possible normalized adjacency matrices and $L \ge 0$, if and only if $\beta < 1/2$. In contrast, the Neumann series approximation $\hat{\psi}_{nsa}(\Delta \mathbf{E}) = (1 - \beta) \sum_{l=0}^L \beta^l\tilde{\mathbf{A}}^l \Delta \mathbf{E}$ is s

Figures (7)

Figure 1: Overview of SEvo. (a) Normal embedding evolution. (b) (Section \ref{['section-SEvo']}) Structure-aware embedding evolution. (c) (Section \ref{['section-SEvo-method']}) Geometric visualization of the variation from $\Delta \mathbf{E}$ to $\psi^*(\Delta \mathbf{E})$. The gray ellipse represents the region with proper smoothness. (d) (Section \ref{['section-convergence']}) The $L$-layer approximation with a faster convergence guarantee.
Figure 2: Empirical illustrations of convergence and smoothness. The top and bottom panels respectively depict the results for Beauty and MovieLens-1M. (a) SASRec enhanced by SEvo with or without rescaling. (b) Smoothness of (I) the original variation; (II) the smoothed variation; (III) the optimized embedding. A lower $\mathcal{J}_{smoothness}$ indicates stronger smoothness.
Figure 3: SEvo ablation experiments.
Figure 4: Illustrations of different pairwise similarity estimation methods based on interaction data. (a) The default is to adopt the co-occurrence frequency within the last $K$ items. (b) Using only the first $K$ items. (c) Allowing a maximum walk length of $H$ beyond 1. (d) Frequency-based similarity versus distance-based similarity.
Figure 5: Comparison of similarity estimation across four potential factors. '$\star$' indicates the default way applied to SEvo-enhanced sequence models in Section \ref{['section-comparison']}. (a) Using only the first/last$K$ items for pairwise similarity estimation. (b) Frequency- and distance-based similarity with a maximum walk length of $H$.
...and 2 more figures

Theorems & Definitions (34)

Definition 1: Structure-aware transformation
Definition 2: Direction-aware transformation
Theorem 1
Theorem 2: Informal
Remark 1
Proposition 1
Theorem 3
proof
Lemma 1
proof
...and 24 more

Graph-enhanced Optimizers for Structure-aware Recommendation Embedding Evolution

TL;DR

Abstract

Graph-enhanced Optimizers for Structure-aware Recommendation Embedding Evolution

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (34)