MTFM: A Scalable and Alignment-free Foundation Model for Industrial Recommendation in Meituan
Xin Song, Zhilin Guan, Ruidong Han, Binghao Tang, Tianwen Chen, Bing Li, Zihao Li, Han Zhang, Fei Jiang, Chaolin Xie, Chi Ma, Chunyang Jiang, Chunzhen Jing, Dengxuan Li, Fengyi Li, Lei Yu, Mengyao Sun, Pu Wang, Qing Wang, Rui Fan, Shangyu Chen, Shifeng Du, Siyuan Bai, Wei Lin, Wentao Zhu, Zhou Han, Zhuo Chen, Zikang Xu
TL;DR
MTFM addresses the challenge of scalable, extensible, and efficient cross-domain, multi-scenario recommendations in industrial settings by introducing heterogeneous tokenization to avoid input alignment. It combines a transformer-based backbone with Hybrid Target Attention and Grouped-Query Attention to balance modeling capacity and computational efficiency, complemented by system-level optimizations and a multi-scenario user-level data pipeline. Offline and online evaluations on Meituan demonstrate state-of-the-art gains across CTR and conversion metrics, with clear throughput and latency improvements enabling production-scale deployment. The work shows that applying alignment-free, cross-scenario learning with co-designed training and deployment can unlock the scaling laws of multi-scenario data for practical, large-scale recommender systems.
Abstract
Industrial recommendation systems typically involve multiple scenarios, yet existing cross-domain (CDR) and multi-scenario (MSR) methods often require prohibitive resources and strict input alignment, limiting their extensibility. We propose MTFM (Meituan Foundation Model for Recommendation), a transformer-based framework that addresses these challenges. Instead of pre-aligning inputs, MTFM transforms cross-domain data into heterogeneous tokens, capturing multi-scenario knowledge in an alignment-free manner. To enhance efficiency, we first introduce a multi-scenario user-level sample aggregation that significantly enhances training throughput by reducing the total number of instances. We further integrate Grouped-Query Attention and a customized Hybrid Target Attention to minimize memory usage and computational complexity. Furthermore, we implement various system-level optimizations, such as kernel fusion and the elimination of CPU-GPU blocking, to further enhance both training and inference throughput. Offline and online experiments validate the effectiveness of MTFM, demonstrating that significant performance gains are achieved by scaling both model capacity and multi-scenario training data.
