Table of Contents
Fetching ...

MTFM: A Scalable and Alignment-free Foundation Model for Industrial Recommendation in Meituan

Xin Song, Zhilin Guan, Ruidong Han, Binghao Tang, Tianwen Chen, Bing Li, Zihao Li, Han Zhang, Fei Jiang, Chaolin Xie, Chi Ma, Chunyang Jiang, Chunzhen Jing, Dengxuan Li, Fengyi Li, Lei Yu, Mengyao Sun, Pu Wang, Qing Wang, Rui Fan, Shangyu Chen, Shifeng Du, Siyuan Bai, Wei Lin, Wentao Zhu, Zhou Han, Zhuo Chen, Zikang Xu

TL;DR

MTFM addresses the challenge of scalable, extensible, and efficient cross-domain, multi-scenario recommendations in industrial settings by introducing heterogeneous tokenization to avoid input alignment. It combines a transformer-based backbone with Hybrid Target Attention and Grouped-Query Attention to balance modeling capacity and computational efficiency, complemented by system-level optimizations and a multi-scenario user-level data pipeline. Offline and online evaluations on Meituan demonstrate state-of-the-art gains across CTR and conversion metrics, with clear throughput and latency improvements enabling production-scale deployment. The work shows that applying alignment-free, cross-scenario learning with co-designed training and deployment can unlock the scaling laws of multi-scenario data for practical, large-scale recommender systems.

Abstract

Industrial recommendation systems typically involve multiple scenarios, yet existing cross-domain (CDR) and multi-scenario (MSR) methods often require prohibitive resources and strict input alignment, limiting their extensibility. We propose MTFM (Meituan Foundation Model for Recommendation), a transformer-based framework that addresses these challenges. Instead of pre-aligning inputs, MTFM transforms cross-domain data into heterogeneous tokens, capturing multi-scenario knowledge in an alignment-free manner. To enhance efficiency, we first introduce a multi-scenario user-level sample aggregation that significantly enhances training throughput by reducing the total number of instances. We further integrate Grouped-Query Attention and a customized Hybrid Target Attention to minimize memory usage and computational complexity. Furthermore, we implement various system-level optimizations, such as kernel fusion and the elimination of CPU-GPU blocking, to further enhance both training and inference throughput. Offline and online experiments validate the effectiveness of MTFM, demonstrating that significant performance gains are achieved by scaling both model capacity and multi-scenario training data.

MTFM: A Scalable and Alignment-free Foundation Model for Industrial Recommendation in Meituan

TL;DR

MTFM addresses the challenge of scalable, extensible, and efficient cross-domain, multi-scenario recommendations in industrial settings by introducing heterogeneous tokenization to avoid input alignment. It combines a transformer-based backbone with Hybrid Target Attention and Grouped-Query Attention to balance modeling capacity and computational efficiency, complemented by system-level optimizations and a multi-scenario user-level data pipeline. Offline and online evaluations on Meituan demonstrate state-of-the-art gains across CTR and conversion metrics, with clear throughput and latency improvements enabling production-scale deployment. The work shows that applying alignment-free, cross-scenario learning with co-designed training and deployment can unlock the scaling laws of multi-scenario data for practical, large-scale recommender systems.

Abstract

Industrial recommendation systems typically involve multiple scenarios, yet existing cross-domain (CDR) and multi-scenario (MSR) methods often require prohibitive resources and strict input alignment, limiting their extensibility. We propose MTFM (Meituan Foundation Model for Recommendation), a transformer-based framework that addresses these challenges. Instead of pre-aligning inputs, MTFM transforms cross-domain data into heterogeneous tokens, capturing multi-scenario knowledge in an alignment-free manner. To enhance efficiency, we first introduce a multi-scenario user-level sample aggregation that significantly enhances training throughput by reducing the total number of instances. We further integrate Grouped-Query Attention and a customized Hybrid Target Attention to minimize memory usage and computational complexity. Furthermore, we implement various system-level optimizations, such as kernel fusion and the elimination of CPU-GPU blocking, to further enhance both training and inference throughput. Offline and online experiments validate the effectiveness of MTFM, demonstrating that significant performance gains are achieved by scaling both model capacity and multi-scenario training data.
Paper Structure (27 sections, 9 equations, 5 figures, 5 tables)

This paper contains 27 sections, 9 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: The Model Architecture of MTFM
  • Figure 2: An Example of Dynamic Mask
  • Figure 3: Data Pipeline of MTFM
  • Figure 4: Scalability analysis of MTFM.
  • Figure 5: Attention heatmap visualization for MTFM. The horizontal/vertical axis represents T/H-tokens respectively.