Table of Contents
Fetching ...

FLAME: Condensing Ensemble Diversity into a Single Network for Efficient Sequential Recommendation

WooJoo Kim, JunYoung Kim, JaeHyung Lim, SeongJin Choi, SeongKu Kang, HwanJo Yu

Abstract

Sequential recommendation requires capturing diverse user behaviors, which a single network often fails to capture. While ensemble methods mitigate this by leveraging multiple networks, training them all from scratch leads to high computational cost and instability from noisy mutual supervision. We propose {\bf F}rozen and {\bf L}earnable networks with {\bf A}ligned {\bf M}odular {\bf E}nsemble ({\bf FLAME}), a novel framework that condenses ensemble-level diversity into a single network for efficient sequential recommendation. During training, FLAME simulates exponential diversity using only two networks via {\it modular ensemble}. By decomposing each network into sub-modules (e.g., layers or blocks) and dynamically combining them, FLAME generates a rich space of diverse representation patterns. To stabilize this process, we pretrain and freeze one network to serve as a semantic anchor and employ {\it guided mutual learning}. This aligns the diverse representations into the space of the remaining learnable network, ensuring robust optimization. Consequently, at inference, FLAME utilizes only the learnable network, achieving ensemble-level performance with zero overhead compared to a single network. Experiments on six datasets show that FLAME outperforms state-of-the-art baselines, achieving up to 7.69$\times$ faster convergence and 9.70\% improvement in NDCG@20. We provide the source code of FLAME at https://github.com/woo-joo/FLAME_SIGIR26.

FLAME: Condensing Ensemble Diversity into a Single Network for Efficient Sequential Recommendation

Abstract

Sequential recommendation requires capturing diverse user behaviors, which a single network often fails to capture. While ensemble methods mitigate this by leveraging multiple networks, training them all from scratch leads to high computational cost and instability from noisy mutual supervision. We propose {\bf F}rozen and {\bf L}earnable networks with {\bf A}ligned {\bf M}odular {\bf E}nsemble ({\bf FLAME}), a novel framework that condenses ensemble-level diversity into a single network for efficient sequential recommendation. During training, FLAME simulates exponential diversity using only two networks via {\it modular ensemble}. By decomposing each network into sub-modules (e.g., layers or blocks) and dynamically combining them, FLAME generates a rich space of diverse representation patterns. To stabilize this process, we pretrain and freeze one network to serve as a semantic anchor and employ {\it guided mutual learning}. This aligns the diverse representations into the space of the remaining learnable network, ensuring robust optimization. Consequently, at inference, FLAME utilizes only the learnable network, achieving ensemble-level performance with zero overhead compared to a single network. Experiments on six datasets show that FLAME outperforms state-of-the-art baselines, achieving up to 7.69 faster convergence and 9.70\% improvement in NDCG@20. We provide the source code of FLAME at https://github.com/woo-joo/FLAME_SIGIR26.

Paper Structure

This paper contains 31 sections, 15 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Training curves for Single, Ensemble-Scratch, and Ensemble-Guide.
  • Figure 2: Conceptual illustration of (a) conventional ensemble and (b) proposed modular ensemble. With $N$ networks, conventional ensemble produces $N$ distinct representations. When each network is decomposed into $M$ sub-modules, modular ensemble generates $N^M$ different representations.
  • Figure 3: (a) t-SNE visualization of sequence representations for conventional ensemble (red and blue shaded areas) and modular ensemble (red, yellow, green, and blue colored points). (b) Performance for individual network in Ensemble-Scratch and Ensemble-Guide. (c, d) PER maps for conventional ensemble and FLAME.
  • Figure 4: Illustration of (a) training and (b) inference procedure of proposed FLAME. In (a), the Learnable network is optimized via the next-item prediction task. In parallel, $2^M$ diverse representations are generated by modular ensemble of the Frozen and Learnable networks and then aligned into a unified semantic space. In (b), only the Learnable network is utilized to enable efficient inference.
  • Figure 5: Model performance of FLAME and baselines with varying number of representations.
  • ...and 4 more figures