Table of Contents
Fetching ...

Fine-grained auxiliary learning for real-world product recommendation

Mario Almagro, Diego Ortego, David Jimenez

TL;DR

This work tackles the real-world requirement of high automation in product recommendation by addressing calibration and thresholding of similarity scores. It introduces Auxiliary Learning for Fine-grained Embeddings (ALC), which uses two auxiliary objectives to sharpen embeddings via hardest negatives and batch-aware context, paired with a Threshold-Consistent Margin (TCM) loss to calibrate scores. The approach is evaluated on LF-AmazonTitles-131K and Tech&Durables across three extreme multi-label classification backbones, achieving state-of-the-art coverage with only modest precision changes. The combined ALC+TCM method offers a practical path toward deployable, high-coverage product retrieval systems, with implications for broader retrieval tasks requiring calibrated similarity and reduced manual intervention.

Abstract

Product recommendation is the task of recovering the closest items to a given query within a large product corpora. Generally, one can determine if top-ranked products are related to the query by applying a similarity threshold; exceeding it deems the product relevant, otherwise manual revision is required. Despite being a well-known problem, the integration of these models in real-world systems is often overlooked. In particular, production systems have strong coverage requirements, i.e., a high proportion of recommendations must be automated. In this paper we propose ALC , an Auxiliary Learning strategy that boosts Coverage through learning fine-grained embeddings. Concretely, we introduce two training objectives that leverage the hardest negatives in the batch to build discriminative training signals between positives and negatives. We validate ALC using three extreme multi-label classification approaches in two product recommendation datasets; LF-AmazonTitles-131K and Tech and Durables (proprietary), demonstrating state-of-the-art coverage rates when combined with a recent threshold-consistent margin loss.

Fine-grained auxiliary learning for real-world product recommendation

TL;DR

This work tackles the real-world requirement of high automation in product recommendation by addressing calibration and thresholding of similarity scores. It introduces Auxiliary Learning for Fine-grained Embeddings (ALC), which uses two auxiliary objectives to sharpen embeddings via hardest negatives and batch-aware context, paired with a Threshold-Consistent Margin (TCM) loss to calibrate scores. The approach is evaluated on LF-AmazonTitles-131K and Tech&Durables across three extreme multi-label classification backbones, achieving state-of-the-art coverage with only modest precision changes. The combined ALC+TCM method offers a practical path toward deployable, high-coverage product retrieval systems, with implications for broader retrieval tasks requiring calibrated similarity and reduced manual intervention.

Abstract

Product recommendation is the task of recovering the closest items to a given query within a large product corpora. Generally, one can determine if top-ranked products are related to the query by applying a similarity threshold; exceeding it deems the product relevant, otherwise manual revision is required. Despite being a well-known problem, the integration of these models in real-world systems is often overlooked. In particular, production systems have strong coverage requirements, i.e., a high proportion of recommendations must be automated. In this paper we propose ALC , an Auxiliary Learning strategy that boosts Coverage through learning fine-grained embeddings. Concretely, we introduce two training objectives that leverage the hardest negatives in the batch to build discriminative training signals between positives and negatives. We validate ALC using three extreme multi-label classification approaches in two product recommendation datasets; LF-AmazonTitles-131K and Tech and Durables (proprietary), demonstrating state-of-the-art coverage rates when combined with a recent threshold-consistent margin loss.

Paper Structure

This paper contains 11 sections, 7 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: General diagram of ALC.
  • Figure 2: Histogram of top-1 predicted scores for PRIME (top) and PRIME together with ALC and TCM regularizations (bottom) in Tech&Durables dataset.