HoMer: Addressing Heterogeneities by Modeling Sequential and Set-wise Contexts for CTR Prediction

Shuwei Chen; Jiajun Cui; Zhengqi Xu; Fan Zhang; Jiangke Fan; Teng Zhang; Xingxing Wang

HoMer: Addressing Heterogeneities by Modeling Sequential and Set-wise Contexts for CTR Prediction

Shuwei Chen, Jiajun Cui, Zhengqi Xu, Fan Zhang, Jiangke Fan, Teng Zhang, Xingxing Wang

TL;DR

HoMer addresses three forms of heterogeneity in CTR prediction by unifying panoramic sequence modeling with set-wise cross-item interactions in a homogeneous Transformer. The panoramic sequence aligns rich non-sequential features with history to produce fine-grained user interest, while the set-wise decoder captures cross-item and user-item interactions across the entire exposed item set in a single model invocation. Empirical results on Meituan’s search ads show an AUC improvement of about 0.0099 over industrial baselines, plus online CTR and RPM boosts of ~1.99% and ~2.46%, respectively, and a 27% reduction in GPU usage due to kernel fusion and shared computations. The work demonstrates strong offline and online performance, practical deployment efficiency, and scalability for large-scale industrial recommender systems.

Abstract

Click-through rate (CTR) prediction, which models behavior sequence and non-sequential features (e.g., user/item profiles or cross features) to infer user interest, underpins industrial recommender systems. However, most methods face three forms of heterogeneity that degrade predictive performance: (i) Feature Heterogeneity persists when limited sequence side features provide less granular interest representation compared to extensive non-sequential features, thereby impairing sequence modeling performance; (ii) Context Heterogeneity arises because a user's interest in an item will be influenced by other items, yet point-wise prediction neglects cross-item interaction context from the entire item set; (iii) Architecture Heterogeneity stems from the fragmented integration of specialized network modules, which compounds the model's effectiveness, efficiency and scalability in industrial deployments. To tackle the above limitations, we propose HoMer, a Homogeneous-Oriented TransforMer for modeling sequential and set-wise contexts. First, we align sequence side features with non-sequential features for accurate sequence modeling and fine-grained interest representation. Second, we shift the prediction paradigm from point-wise to set-wise, facilitating cross-item interaction in a highly parallel manner. Third, HoMer's unified encoder-decoder architecture achieves dual optimization through structural simplification and shared computation, ensuring computational efficiency while maintaining scalability with model size. Without arduous modification to the prediction pipeline, HoMer successfully scales up and outperforms our industrial baseline by 0.0099 in the AUC metric, and enhances online business metrics like CTR/RPM by 1.99%/2.46%. Additionally, HoMer saves 27% of GPU resources via preliminary engineering optimization, further validating its superiority and practicality.

HoMer: Addressing Heterogeneities by Modeling Sequential and Set-wise Contexts for CTR Prediction

TL;DR

Abstract

HoMer: Addressing Heterogeneities by Modeling Sequential and Set-wise Contexts for CTR Prediction

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)