Table of Contents
Fetching ...

Lightweight Embeddings with Graph Rewiring for Collaborative Filtering

Xurong Liang, Tong Chen, Wei Yuan, Hongzhi Yin

TL;DR

LERG tackles the dual bottlenecks of on-device GNN-based recommenders: embedding storage and graph-propagation cost. It integrates a quantized compositional embedding table with a sparsified, rewired propagation graph and a lightweight on-device fine-tuning pipeline, enabling competitive recommendations under tight hardware budgets. Key innovations include LSQ-based quantization of meta-embeddings, a fixed assignment matrix, BIP-guided graph rewiring, and placeholder embeddings for pruned entities, all orchestrated to minimize memory and MACs while preserving performance. The approach demonstrates strong results across three benchmarks, including industry-scale data, signaling practical impact for edge deployment of graph-based recommenders. The work thus offers a scalable pathway for high-quality recommendations on resource-constrained devices, reducing reliance on cloud-centric inference and enabling faster, privacy-preserving personalization.

Abstract

As recommendation services scale rapidly and their deployment now commonly involves resource-constrained edge devices, GNN-based recommender systems face significant challenges, including high embedding storage costs and runtime latency from graph propagations. Our previous work, LEGCF, effectively reduced embedding storage costs but struggled to maintain recommendation performance under stricter storage limits. Additionally, LEGCF did not address the extensive runtime computation costs associated with graph propagation, which involves heavy multiplication and accumulation operations (MACs). These challenges consequently hinder effective training and inference on resource-constrained edge devices. To address these limitations, we propose Lightweight Embeddings with Rewired Graph for Graph Collaborative Filtering (LERG), an improved extension of LEGCF. LERG retains LEGCFs compositional codebook structure but introduces quantization techniques to reduce the storage cost, enabling the inclusion of more meta-embeddings within the same storage. To optimize graph propagation, we pretrain the quantized compositional embedding table using the full interaction graph on resource-rich servers, after which a fine-tuning stage is engaged to identify and prune low-contribution entities via a gradient-free binary integer programming approach, constructing a rewired graph that excludes these entities (i.e., user/item nodes) from propagating signals. The quantized compositional embedding table with selective embedding participation and sparse rewired graph are transferred to edge devices which significantly reduce computation memory and inference time. Experiments on three public benchmark datasets, including an industry-scale dataset, demonstrate that LERG achieves superior recommendation performance while dramatically reducing storage and computation costs for graph-based recommendation services.

Lightweight Embeddings with Graph Rewiring for Collaborative Filtering

TL;DR

LERG tackles the dual bottlenecks of on-device GNN-based recommenders: embedding storage and graph-propagation cost. It integrates a quantized compositional embedding table with a sparsified, rewired propagation graph and a lightweight on-device fine-tuning pipeline, enabling competitive recommendations under tight hardware budgets. Key innovations include LSQ-based quantization of meta-embeddings, a fixed assignment matrix, BIP-guided graph rewiring, and placeholder embeddings for pruned entities, all orchestrated to minimize memory and MACs while preserving performance. The approach demonstrates strong results across three benchmarks, including industry-scale data, signaling practical impact for edge deployment of graph-based recommenders. The work thus offers a scalable pathway for high-quality recommendations on resource-constrained devices, reducing reliance on cloud-centric inference and enabling faster, privacy-preserving personalization.

Abstract

As recommendation services scale rapidly and their deployment now commonly involves resource-constrained edge devices, GNN-based recommender systems face significant challenges, including high embedding storage costs and runtime latency from graph propagations. Our previous work, LEGCF, effectively reduced embedding storage costs but struggled to maintain recommendation performance under stricter storage limits. Additionally, LEGCF did not address the extensive runtime computation costs associated with graph propagation, which involves heavy multiplication and accumulation operations (MACs). These challenges consequently hinder effective training and inference on resource-constrained edge devices. To address these limitations, we propose Lightweight Embeddings with Rewired Graph for Graph Collaborative Filtering (LERG), an improved extension of LEGCF. LERG retains LEGCFs compositional codebook structure but introduces quantization techniques to reduce the storage cost, enabling the inclusion of more meta-embeddings within the same storage. To optimize graph propagation, we pretrain the quantized compositional embedding table using the full interaction graph on resource-rich servers, after which a fine-tuning stage is engaged to identify and prune low-contribution entities via a gradient-free binary integer programming approach, constructing a rewired graph that excludes these entities (i.e., user/item nodes) from propagating signals. The quantized compositional embedding table with selective embedding participation and sparse rewired graph are transferred to edge devices which significantly reduce computation memory and inference time. Experiments on three public benchmark datasets, including an industry-scale dataset, demonstrate that LERG achieves superior recommendation performance while dramatically reducing storage and computation costs for graph-based recommendation services.

Paper Structure

This paper contains 23 sections, 16 equations, 4 figures, 6 tables, 3 algorithms.

Figures (4)

  • Figure 1: The overall workflow of LERG. (a) corresponds to the quantized compositional embedding table pretraining stage described in Sec. \ref{['sec:pretrain']}, (b) corresponds to graph rewiring for graph sparsification in Sec. \ref{['sec:graph_rewiring']}, (c) corresponds to pruned entity embedding imputation in Sec. \ref{['sec:finetuning']}, (d) corresponds to entity embeddings generation in the inference and fine-tuning stages described in Sec. \ref{['sec:finetuning']}. (a), (b), (c) are all conducted on the resource-rich server side, the fine-tuning stage in (d) can be conducted either on-server or on-edge. The red dotted arrows in (b) indicate the constructed edges between $v_4$ and her indirect neighbors after the graph rewiring process.
  • Figure 2: The plots on the left hand side show the recommendation performance of LERG w.r.t. different retention ratios. The plots on the right hand side depict the relationship between retention ratios and the computational efficiency of our algorithm. The trend of training epoch elapsed time (in seconds) is shown in purple color. The trend of MACs (in billions) incurred during graph propagation is shown in green color. Here we record the training time per epoch as an indicator of algorithm time efficiency to align with the potential need for fine-tuning the model directly on-edge once deployed.
  • Figure 3: The performance of LERG w.r.t. various hyperparameter settings.
  • Figure 4: The performance of LERG when using different quantization precisions for $\Bar{\textbf{E}}_{meta}$.