HHFT: Hierarchical Heterogeneous Feature Transformer for Recommendation Systems
Liren Yu, Wenming Zhang, Silu Zhou, Tao Zhang, Zhixuan Zhang, Dan Ou
TL;DR
HHFT introduces a hierarchical heterogeneous feature transformer for CTR prediction in ranking systems, combining semantic feature partitioning, block-specific Transformer encoders, and Hiformer layers to explicitly model high-order interactions while preserving feature semantics. The architecture demonstrates scalable performance, with empirical scaling laws showing width- and high-order-focused gains, and delivers tangible online business improvements on Taobao (CTR AUC +0.4%, GMV +0.6%). Offline experiments and ablations confirm the contributions of semantic partitioning, heterogeneous parameterization, and Hiformer components. The work provides a practical, deployment-ready framework for industrial recommender systems and outlines future directions toward joint ranking across search, recommendation, and advertising.
Abstract
We propose HHFT (Hierarchical Heterogeneous Feature Transformer), a Transformer-based architecture tailored for industrial CTR prediction. HHFT addresses the limitations of DNN through three key designs: (1) Semantic Feature Partitioning: Grouping heterogeneous features (e.g. user profile, item information, behaviour sequennce) into semantically coherent blocks to preserve domain-specific information; (2) Heterogeneous Transformer Encoder: Adopting block-specific QKV projections and FFNs to avoid semantic confusion between distinct feature types; (3) Hiformer Layer: Capturing high-order interactions across features. Our findings reveal that Transformers significantly outperform DNN baselines, achieving a +0.4% improvement in CTR AUC at scale. We have successfully deployed the model on Taobao's production platform, observing a significant uplift in key business metrics, including a +0.6% increase in Gross Merchandise Value (GMV).
