SEQ+MD: Learning Multi-Task as a SEQuence with Multi-Distribution Data

Siqi Wang; Audrey Zhijiao Chen; Austin Clapp; Sheng-Min Shih; Xiaoting Zhao

SEQ+MD: Learning Multi-Task as a SEQuence with Multi-Distribution Data

Siqi Wang, Audrey Zhijiao Chen, Austin Clapp, Sheng-Min Shih, Xiaoting Zhao

TL;DR

SEQ+MD addresses dual challenges in global e-commerce search: learning multiple tasks that unfold sequentially (e.g., click → add to cart → purchase) and handling region-specific input distributions. The SEQ component treats tasks as a sequence, sharing information through a GRU while enforcing a non-increasing output probability with a Descending Probability Regularizer, and the MD module splits features into country-driven and invariant groups, applying a country-conditioned mask to align multi-distribution inputs. Experiments on in-house data show SEQ+MD improves high-value purchase tasks while keeping clicks stable, and the MD module provides a plug-and-play boost to existing MTL baselines. The approach demonstrates strong transferability to additional tasks and clear alignment with regional shopping preferences, offering practical benefits for global e-commerce ranking systems.

Abstract

In e-commerce, the order in which search results are displayed when a customer tries to find relevant listings can significantly impact their shopping experience and search efficiency. Tailored re-ranking system based on relevance and engagement signals in E-commerce has often shown improvement on sales and gross merchandise value (GMV). Designing algorithms for this purpose is even more challenging when the shops are not restricted to domestic buyers, but can sale globally to international buyers. Our solution needs to incorporate shopping preference and cultural traditions in different buyer markets. We propose the SEQ+MD framework, which integrates sequential learning for multi-task learning (MTL) and feature-generated region-mask for multi-distribution input. This approach leverages the sequential order within tasks and accounts for regional heterogeneity, enhancing performance on multi-source data. Evaluations on in-house data showed a strong increase on the high-value engagement including add-to-cart and purchase while keeping click performance neutral compared to state-of-the-art baseline models. Additionally, our multi-regional learning module is "plug-and-play" and can be easily adapted to enhance other MTL applications.

SEQ+MD: Learning Multi-Task as a SEQuence with Multi-Distribution Data

TL;DR

Abstract

Paper Structure (16 sections, 2 equations, 8 figures, 3 tables)

This paper contains 16 sections, 2 equations, 8 figures, 3 tables.

Introduction
Related Work
Method
Problem Definition
Learning Multi-Task as A SEQuence
Learning with Multi-Distribution Input
Experiments
Baseline Models
Datasets and Metrics
Results
Discussions
Will the sequential learning model benefit from more tasks?
Transferability from two-task to three-task
Ablation studies
How effective is the MD module when compared to models trained with single regional data?
...and 1 more sections

Figures (8)

Figure 1: MTL Architecture Comparison. (a) Prior work mlmmoe_ma2018modelingple_tang2020progressiveli2023adatt uses experts and gates for task knowledge sharing, with variations in whether the expert or gate is shared among tasks. (b) Our SEQ learns multi-task as a sequence, where task knowledge is shared through sequence tokens.
Figure 2: Regional Difference Examples. (a) The same search query on different regional sites should display different listings to reflect local preferences. For example, GB (United Kingdom) shoppers often choose cookie boxes as birthday gifts, while Canadian shoppers favor birthday cards. (b) Feature distribution shifts across countries. In Canada (CA) and the UK (GB), some features display an entirely different distribution pattern, posing a challenge for the model to learn.
Figure 3: SEQ+MD overall architecture. (a) Feature processing. The input is split into three parts: country features, dependent features, and invariant features.Invariant features are processed into a sequence input with MLP blocks, and then the features are output from stage 1 RNN as a sequence. Country features and dependent features are processed through our multi-distribution (MD) learning module, with each task having its own country mask weights. More details about the multi-distribution adaptor module can be found in Fig. \ref{['fig4_adapt']}. (b) Multi-task Learning. The concatenated features pass through the following RNN layers, providing the model's final output scores for each task. Note that the RNN blocks illustrate the model's architecture, and the number of layers can vary.
Figure 4: Multi-Distribution Adaptor Module (MD). The input is broken down into three parts: Country features (the cause of the distribution difference), dependent features (the features with multi-distributions), and invariant features (the features with consistent distributions). The country features generate a weight mask through an MLP block, which is then element-wise multiplied with the dependent features. This product feature is processed through an MLP, producing transformed dependent features that are assumed to be invariant. These are then concatenated with the original invariant features from the input to create the transformed input. This transformed input can then be passed to any MTL models for further processing.
Figure 5: Transferability of SEQ+MD from two-task to three-task models is evaluated by comparing the performance of shared-bottom shared_bottom_caruana1997multitask and SEQ+MD models trained on three-task data with the SEQ+MD model trained on two-task data. Remarkably, despite the SEQ+MD model not being trained on add to cart data, it still shows improved performance on the add to cart and purchase tasks when compared to the shared-bottom shared_bottom_caruana1997multitask model. See Sec. \ref{['subsec:2t_3t']} for the discussion.
...and 3 more figures

SEQ+MD: Learning Multi-Task as a SEQuence with Multi-Distribution Data

TL;DR

Abstract

SEQ+MD: Learning Multi-Task as a SEQuence with Multi-Distribution Data

Authors

TL;DR

Abstract

Table of Contents

Figures (8)