Table of Contents
Fetching ...

Spatial-Temporal Knowledge Distillation for Takeaway Recommendation

Shuyuan Zhao, Wei Chen, Boyan Shi, Liyong Zhou, Shuohao Lin, Huaiyu Wan

TL;DR

This work tackles takeaway recommendation under sparsity and complex geospatial dynamics by introducing STKDRec, a two-stage framework that first pre-trains a Spatial-Temporal Knowledge Graph encoder (teacher) and then distills its knowledge into a Spatial-Temporal Transformer (student). The STKG encoder learns high-order spatiotemporal dependencies and generates soft labels, while the ST-Transformer models dynamic user preferences from spatially enhanced sequences; knowledge transfer is achieved via a distillation objective combining KL divergence and prediction losses with a temperature parameter $\tau$ and balance factor $\alpha$. Key contributions include the STKG encoder with subgraph sampling and GNN aggregation, the ST-Transformer with spatial-enhanced sequence representations and spatial-temporal context attention, and the STKD strategy that enables efficient heterogeneous knowledge fusion with reduced computational overhead. Experiments on three real-world datasets demonstrate that STKDRec outperforms competitive baselines, validating its effectiveness and practical impact for real-world takeaway platforms.

Abstract

The takeaway recommendation system aims to recommend users' future takeaway purchases based on their historical purchase behaviors, thereby improving user satisfaction and boosting merchant sales. Existing methods focus on incorporating auxiliary information or leveraging knowledge graphs to alleviate the sparsity issue of user purchase sequences. However, two main challenges limit the performance of these approaches: (1) capturing dynamic user preferences on complex geospatial information and (2) efficiently integrating spatial-temporal knowledge from both graphs and sequence data with low computational costs. In this paper, we propose a novel spatial-temporal knowledge distillation model for takeaway recommendation (STKDRec) based on the two-stage training process. Specifically, during the first pre-training stage, a spatial-temporal knowledge graph (STKG) encoder is trained to extract high-order spatial-temporal dependencies and collaborative associations from the STKG. During the second spatial-temporal knowledge distillation (STKD) stage, a spatial-temporal Transformer (ST-Transformer) is employed to comprehensively model dynamic user preferences on various types of fine-grained geospatial information from a sequential perspective. Furthermore, the STKD strategy is introduced to transfer graph-based spatial-temporal knowledge to the ST-Transformer, facilitating the adaptive fusion of rich knowledge derived from both the STKG and sequence data while reducing computational overhead. Extensive experiments on three real-world datasets show that STKDRec significantly outperforms the state-of-the-art baselines.

Spatial-Temporal Knowledge Distillation for Takeaway Recommendation

TL;DR

This work tackles takeaway recommendation under sparsity and complex geospatial dynamics by introducing STKDRec, a two-stage framework that first pre-trains a Spatial-Temporal Knowledge Graph encoder (teacher) and then distills its knowledge into a Spatial-Temporal Transformer (student). The STKG encoder learns high-order spatiotemporal dependencies and generates soft labels, while the ST-Transformer models dynamic user preferences from spatially enhanced sequences; knowledge transfer is achieved via a distillation objective combining KL divergence and prediction losses with a temperature parameter and balance factor . Key contributions include the STKG encoder with subgraph sampling and GNN aggregation, the ST-Transformer with spatial-enhanced sequence representations and spatial-temporal context attention, and the STKD strategy that enables efficient heterogeneous knowledge fusion with reduced computational overhead. Experiments on three real-world datasets demonstrate that STKDRec outperforms competitive baselines, validating its effectiveness and practical impact for real-world takeaway platforms.

Abstract

The takeaway recommendation system aims to recommend users' future takeaway purchases based on their historical purchase behaviors, thereby improving user satisfaction and boosting merchant sales. Existing methods focus on incorporating auxiliary information or leveraging knowledge graphs to alleviate the sparsity issue of user purchase sequences. However, two main challenges limit the performance of these approaches: (1) capturing dynamic user preferences on complex geospatial information and (2) efficiently integrating spatial-temporal knowledge from both graphs and sequence data with low computational costs. In this paper, we propose a novel spatial-temporal knowledge distillation model for takeaway recommendation (STKDRec) based on the two-stage training process. Specifically, during the first pre-training stage, a spatial-temporal knowledge graph (STKG) encoder is trained to extract high-order spatial-temporal dependencies and collaborative associations from the STKG. During the second spatial-temporal knowledge distillation (STKD) stage, a spatial-temporal Transformer (ST-Transformer) is employed to comprehensively model dynamic user preferences on various types of fine-grained geospatial information from a sequential perspective. Furthermore, the STKD strategy is introduced to transfer graph-based spatial-temporal knowledge to the ST-Transformer, facilitating the adaptive fusion of rich knowledge derived from both the STKG and sequence data while reducing computational overhead. Extensive experiments on three real-world datasets show that STKDRec significantly outperforms the state-of-the-art baselines.

Paper Structure

This paper contains 28 sections, 13 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: An illustrative example highlighting the importance of capturing dynamic user preferences on complex geospatial information.
  • Figure 2: The overall architecture of STKDRec, consisting of two stages: the pre-training stage and the STKD stage.
  • Figure 3: Study on different knowledge fusion methods. Multi refers to multiplication, Cat refers to concatenation, Add refers to addition, STKD denotes our proposed strategy, and Time indicates model training and prediction duration.
  • Figure 4: Study on different temperature $\tau$ and the number of neighbor nodes sampled $s$.
  • Figure 5: Visualization of case study on recommendation results.