Table of Contents
Fetching ...

Hierarchical LoRA MoE for Efficient CTR Model Scaling

Zhichen Zeng, Mengyue Hang, Xiaolong Liu, Xiaoyi Liu, Xiao Lin, Ruizhong Qiu, Tianxin Wei, Zhining Liu, Siyang Yuan, Chaofei Yang, Yiqun Liu, Hang Yin, Jiyan Yang, Hanghang Tong

TL;DR

HiLoMoE tackles the need for scalable, efficient CTR prediction by unifying vertical and horizontal scaling through a hierarchical MoE framework built on rank-1 LoRA experts. A hierarchical routing scheme enables combinatorially diverse expert paths while allowing heavy computations to run in parallel after routing scores are determined, preserving inference efficiency. A principled three-stage training pipeline with auxiliary losses stabilizes optimization and promotes expert diversity, enabling robust performance across datasets. Empirically, HiLoMoE achieves about 0.20% average AUC gains and around 18.5% FLOPs reduction compared to non-MoE baselines, demonstrating strong practical gains for large-scale CTR systems.

Abstract

Deep models have driven significant advances in click-through rate (CTR) prediction. While vertical scaling via layer stacking improves model expressiveness, the layer-by-layer sequential computation poses challenges to efficient scaling. Conversely, horizontal scaling through Mixture of Experts (MoE) achieves efficient scaling by activating a small subset of experts in parallel, but flat MoE layers may struggle to capture the hierarchical structure inherent in recommendation tasks. To push the Return-On-Investment (ROI) boundary, we explore the complementary strengths of both directions and propose HiLoMoE, a hierarchical LoRA MoE framework that enables holistic scaling in a parameter-efficient manner. Specifically, HiLoMoE employs lightweight rank-1 experts for parameter-efficient horizontal scaling, and stacks multiple MoE layers with hierarchical routing to enable combinatorially diverse expert compositions. Unlike conventional stacking, HiLoMoE routes based on prior layer scores rather than outputs, allowing all layers to execute in parallel. A principled three-stage training framework ensures stable optimization and expert diversity. Experiments on four public datasets show that HiLoMoE achieving better performance-efficiency tradeoff, achieving an average AUC improvement of 0.20\% in AUC and 18.5\% reduction in FLOPs compared to the non-MoE baseline.

Hierarchical LoRA MoE for Efficient CTR Model Scaling

TL;DR

HiLoMoE tackles the need for scalable, efficient CTR prediction by unifying vertical and horizontal scaling through a hierarchical MoE framework built on rank-1 LoRA experts. A hierarchical routing scheme enables combinatorially diverse expert paths while allowing heavy computations to run in parallel after routing scores are determined, preserving inference efficiency. A principled three-stage training pipeline with auxiliary losses stabilizes optimization and promotes expert diversity, enabling robust performance across datasets. Empirically, HiLoMoE achieves about 0.20% average AUC gains and around 18.5% FLOPs reduction compared to non-MoE baselines, demonstrating strong practical gains for large-scale CTR systems.

Abstract

Deep models have driven significant advances in click-through rate (CTR) prediction. While vertical scaling via layer stacking improves model expressiveness, the layer-by-layer sequential computation poses challenges to efficient scaling. Conversely, horizontal scaling through Mixture of Experts (MoE) achieves efficient scaling by activating a small subset of experts in parallel, but flat MoE layers may struggle to capture the hierarchical structure inherent in recommendation tasks. To push the Return-On-Investment (ROI) boundary, we explore the complementary strengths of both directions and propose HiLoMoE, a hierarchical LoRA MoE framework that enables holistic scaling in a parameter-efficient manner. Specifically, HiLoMoE employs lightweight rank-1 experts for parameter-efficient horizontal scaling, and stacks multiple MoE layers with hierarchical routing to enable combinatorially diverse expert compositions. Unlike conventional stacking, HiLoMoE routes based on prior layer scores rather than outputs, allowing all layers to execute in parallel. A principled three-stage training framework ensures stable optimization and expert diversity. Experiments on four public datasets show that HiLoMoE achieving better performance-efficiency tradeoff, achieving an average AUC improvement of 0.20\% in AUC and 18.5\% reduction in FLOPs compared to the non-MoE baseline.

Paper Structure

This paper contains 38 sections, 3 theorems, 7 equations, 9 figures, 2 tables.

Key Result

proposition 1

For an $L$-layer $K$-expert HiLoMoE operating on a $d$-dimensional sequence of length $N$, the space complexity is $\mathcal{O}(KLd)$ and the time complexity is $\mathcal{O}(Nd^2)$.

Figures (9)

  • Figure 1: Motivation for Hierarchical MoE. Recommendation follows a hierarchical structure, where user–item interactions span multiple granularities. By hierarchically selecting experts such as Electronics (category), Laptop (product), and Apple (brand), the model collaboratively delivers personalized recommendations at different levels, effectively leveraging hierarchical knowledge for more accurate predictions.
  • Figure 2: An overview of the proposed HiLoMoE which includes $L$ layers, each containing $K$ rank-1 experts.
  • Figure 3: A hierarchical routing strategy selects the current experts based on selections from the previous layers.
  • Figure 4: A 3-stage training framework for stable training.
  • Figure 5: Horizontal scaling on #experts.
  • ...and 4 more figures

Theorems & Definitions (3)

  • proposition 1: Complexity Analysis
  • proposition 2: Model Scaling
  • proposition 2: Model Scaling