Table of Contents
Fetching ...

HHFT: Hierarchical Heterogeneous Feature Transformer for Recommendation Systems

Liren Yu, Wenming Zhang, Silu Zhou, Tao Zhang, Zhixuan Zhang, Dan Ou

TL;DR

HHFT introduces a hierarchical heterogeneous feature transformer for CTR prediction in ranking systems, combining semantic feature partitioning, block-specific Transformer encoders, and Hiformer layers to explicitly model high-order interactions while preserving feature semantics. The architecture demonstrates scalable performance, with empirical scaling laws showing width- and high-order-focused gains, and delivers tangible online business improvements on Taobao (CTR AUC +0.4%, GMV +0.6%). Offline experiments and ablations confirm the contributions of semantic partitioning, heterogeneous parameterization, and Hiformer components. The work provides a practical, deployment-ready framework for industrial recommender systems and outlines future directions toward joint ranking across search, recommendation, and advertising.

Abstract

We propose HHFT (Hierarchical Heterogeneous Feature Transformer), a Transformer-based architecture tailored for industrial CTR prediction. HHFT addresses the limitations of DNN through three key designs: (1) Semantic Feature Partitioning: Grouping heterogeneous features (e.g. user profile, item information, behaviour sequennce) into semantically coherent blocks to preserve domain-specific information; (2) Heterogeneous Transformer Encoder: Adopting block-specific QKV projections and FFNs to avoid semantic confusion between distinct feature types; (3) Hiformer Layer: Capturing high-order interactions across features. Our findings reveal that Transformers significantly outperform DNN baselines, achieving a +0.4% improvement in CTR AUC at scale. We have successfully deployed the model on Taobao's production platform, observing a significant uplift in key business metrics, including a +0.6% increase in Gross Merchandise Value (GMV).

HHFT: Hierarchical Heterogeneous Feature Transformer for Recommendation Systems

TL;DR

HHFT introduces a hierarchical heterogeneous feature transformer for CTR prediction in ranking systems, combining semantic feature partitioning, block-specific Transformer encoders, and Hiformer layers to explicitly model high-order interactions while preserving feature semantics. The architecture demonstrates scalable performance, with empirical scaling laws showing width- and high-order-focused gains, and delivers tangible online business improvements on Taobao (CTR AUC +0.4%, GMV +0.6%). Offline experiments and ablations confirm the contributions of semantic partitioning, heterogeneous parameterization, and Hiformer components. The work provides a practical, deployment-ready framework for industrial recommender systems and outlines future directions toward joint ranking across search, recommendation, and advertising.

Abstract

We propose HHFT (Hierarchical Heterogeneous Feature Transformer), a Transformer-based architecture tailored for industrial CTR prediction. HHFT addresses the limitations of DNN through three key designs: (1) Semantic Feature Partitioning: Grouping heterogeneous features (e.g. user profile, item information, behaviour sequennce) into semantically coherent blocks to preserve domain-specific information; (2) Heterogeneous Transformer Encoder: Adopting block-specific QKV projections and FFNs to avoid semantic confusion between distinct feature types; (3) Hiformer Layer: Capturing high-order interactions across features. Our findings reveal that Transformers significantly outperform DNN baselines, achieving a +0.4% improvement in CTR AUC at scale. We have successfully deployed the model on Taobao's production platform, observing a significant uplift in key business metrics, including a +0.6% increase in Gross Merchandise Value (GMV).

Paper Structure

This paper contains 21 sections, 4 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: HHFT Architecture.
  • Figure 2: AUC gain vs Dense Parameters Scale Ratio