Table of Contents
Fetching ...

Deep Domain Specialisation for single-model multi-domain learning to rank

Paul Missault, Abdelmaseeh Felfel

TL;DR

The paper tackles the cost and scalability challenges of maintaining domain-specific ranking models by proposing Deep Domain Specialisation (DDS), a single-model approach that preserves domain-specific representations. DDS is contrasted with Deep Domain Adaptation (DDA) and baseline multi-domain variants, showing DDS achieves superior or comparable relevance (measured by NDCG) across two geographic domains while using fewer parameters. Offline evaluations and a large-scale online interleaving experiment on AE and SA stores demonstrate that DDS can replace multiple domain-specific models without sacrificing performance, and often improves it. The work highlights practical benefits for maintenance and deployment in real-world IR systems, and points to future work on scaling to more domains and deeper analysis of observed gains.

Abstract

Information Retrieval (IR) practitioners often train separate ranking models for different domains (geographic regions, languages, stores, websites,...) as it is believed that exclusively training on in-domain data yields the best performance when sufficient data is available. Despite their performance gains, training multiple models comes at a higher cost to train, maintain and update compared to having only a single model responsible for all domains. Our work explores consolidated ranking models that serve multiple domains. Specifically, we propose a novel architecture of Deep Domain Specialisation (DDS) to consolidate multiple domains into a single model. We compare our proposal against Deep Domain Adaptation (DDA) and a set of baseline for multi-domain models. In our experiments, DDS performed the best overall while requiring fewer parameters per domain as other baselines. We show the efficacy of our method both with offline experimentation and on a large-scale online experiment on Amazon customer traffic.

Deep Domain Specialisation for single-model multi-domain learning to rank

TL;DR

The paper tackles the cost and scalability challenges of maintaining domain-specific ranking models by proposing Deep Domain Specialisation (DDS), a single-model approach that preserves domain-specific representations. DDS is contrasted with Deep Domain Adaptation (DDA) and baseline multi-domain variants, showing DDS achieves superior or comparable relevance (measured by NDCG) across two geographic domains while using fewer parameters. Offline evaluations and a large-scale online interleaving experiment on AE and SA stores demonstrate that DDS can replace multiple domain-specific models without sacrificing performance, and often improves it. The work highlights practical benefits for maintenance and deployment in real-world IR systems, and points to future work on scaling to more domains and deeper analysis of observed gains.

Abstract

Information Retrieval (IR) practitioners often train separate ranking models for different domains (geographic regions, languages, stores, websites,...) as it is believed that exclusively training on in-domain data yields the best performance when sufficient data is available. Despite their performance gains, training multiple models comes at a higher cost to train, maintain and update compared to having only a single model responsible for all domains. Our work explores consolidated ranking models that serve multiple domains. Specifically, we propose a novel architecture of Deep Domain Specialisation (DDS) to consolidate multiple domains into a single model. We compare our proposal against Deep Domain Adaptation (DDA) and a set of baseline for multi-domain models. In our experiments, DDS performed the best overall while requiring fewer parameters per domain as other baselines. We show the efficacy of our method both with offline experimentation and on a large-scale online experiment on Amazon customer traffic.
Paper Structure (17 sections, 1 figure, 3 tables)

This paper contains 17 sections, 1 figure, 3 tables.

Figures (1)

  • Figure 5: NDCG scores of 5 runs of the different methods and baselines used in this paper.