Table of Contents
Fetching ...

CSMF: Cascaded Selective Mask Fine-Tuning for Multi-Objective Embedding-Based Retrieval

Hao Deng, Haibo Xing, Kanefumi Matsuyama, Moyu Zhang, Jinxin Hu, Hong Wen, Yu Zhang, Xiaoyi Zeng, Jing Zhang

TL;DR

CSMF tackles multi-objective embedding-based retrieval by introducing cascaded selective mask fine-tuning that allocates independent learning space for each objective without adding parameters. It combines exposure pre-training with sequential fine-tuning on click and conversion tasks, augmented by CPP pruning and AML loss to mitigate gradient conflicts and forgetting. The framework yields a linearly fused multi-objective score without increasing final vector dimensions, enabling efficient online serving and adaptable objective weighting. Real-world experiments show consistent offline gains on industrial and AliExpress datasets and positive online A/B results in RPM, CTR, and CVR, underscoring practical impact for multi-objective EBR systems.

Abstract

Multi-objective embedding-based retrieval (EBR) has become increasingly critical due to the growing complexity of user behaviors and commercial objectives. While traditional approaches often suffer from data sparsity and limited information sharing between objectives, recent methods utilizing a shared network alongside dedicated sub-networks for each objective partially address these limitations. However, such methods significantly increase the model parameters, leading to an increased retrieval latency and a limited ability to model causal relationships between objectives. To address these challenges, we propose the Cascaded Selective Mask Fine-Tuning (CSMF), a novel method that enhances both retrieval efficiency and serving performance for multi-objective EBR. The CSMF framework selectively masks model parameters to free up independent learning space for each objective, leveraging the cascading relationships between objectives during the sequential fine-tuning. Without increasing network parameters or online retrieval overhead, CSMF computes a linearly weighted fusion score for multiple objective probabilities while supporting flexible adjustment of each objective's weight across various recommendation scenarios. Experimental results on real-world datasets demonstrate the superior performance of CSMF, and online experiments validate its significant practical value.

CSMF: Cascaded Selective Mask Fine-Tuning for Multi-Objective Embedding-Based Retrieval

TL;DR

CSMF tackles multi-objective embedding-based retrieval by introducing cascaded selective mask fine-tuning that allocates independent learning space for each objective without adding parameters. It combines exposure pre-training with sequential fine-tuning on click and conversion tasks, augmented by CPP pruning and AML loss to mitigate gradient conflicts and forgetting. The framework yields a linearly fused multi-objective score without increasing final vector dimensions, enabling efficient online serving and adaptable objective weighting. Real-world experiments show consistent offline gains on industrial and AliExpress datasets and positive online A/B results in RPM, CTR, and CVR, underscoring practical impact for multi-objective EBR systems.

Abstract

Multi-objective embedding-based retrieval (EBR) has become increasingly critical due to the growing complexity of user behaviors and commercial objectives. While traditional approaches often suffer from data sparsity and limited information sharing between objectives, recent methods utilizing a shared network alongside dedicated sub-networks for each objective partially address these limitations. However, such methods significantly increase the model parameters, leading to an increased retrieval latency and a limited ability to model causal relationships between objectives. To address these challenges, we propose the Cascaded Selective Mask Fine-Tuning (CSMF), a novel method that enhances both retrieval efficiency and serving performance for multi-objective EBR. The CSMF framework selectively masks model parameters to free up independent learning space for each objective, leveraging the cascading relationships between objectives during the sequential fine-tuning. Without increasing network parameters or online retrieval overhead, CSMF computes a linearly weighted fusion score for multiple objective probabilities while supporting flexible adjustment of each objective's weight across various recommendation scenarios. Experimental results on real-world datasets demonstrate the superior performance of CSMF, and online experiments validate its significant practical value.

Paper Structure

This paper contains 22 sections, 8 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: The cascading relationships among user actions. (a) The recommendation page. (b) The product detail page, displayed after the user clicks on a product, prominently features a "Buy Now" button. (c) The checkout page is displayed after the user clicks the "Buy Now" button.
  • Figure 2: Methods for multi-objective EBR. (a) Separate EBR models for each objective yi2019sampling. (b) A unified EBR model trained on a mixed multi-objective dataset zheng2022multizhao2021distillation. (c) MOE-based methods for multi-objective EBR model xu2022mixture. (d) Our proposed CSMF.
  • Figure 3: Illustration of CSMF Framework. Taking one of the matrices in the user or item tower as an example, the CSMF framework for three objectives (exposure, click, and conversion) is organized into three stages. First, the backbone model is pre-trained on large-scale exposure data. The model then undergoes two fine-tuning stages with click and conversion data, respectively. Before fine-tuning, CSMF selectively masks redundant parameters to free up space for the new tasks. To ensure accuracy, unpruned parameters are fine-tuned again on a small subset of the current stage's data before being frozen. This iterative process (pre-train → selective mask → accuracy recovery → fine-tune) enables the sequential optimization of multiple objectives, addressing knowledge sharing and catastrophic forgetting issues.
  • Figure 4: The performance of CSMF with different pruning ratios in the CPP method.
  • Figure 5: The performance of CSMF under different hyperparameters ($\eta$, $k_{o}$, $k_{r}$, and $k_{d}$). $\eta$ denotes the adaptive coefficient in the AML method, while $k_{o}$, $k_{r}$, and $k_{d}$ represent the weights for the click objective, conversion objective, and exposure objective, respectively.