Table of Contents
Fetching ...

Pailitao-VL: Unified Embedding and Reranker for Real-Time Multi-Modal Industrial Search

Lei Chen, Chen Ju, Xu Chen, Zhicheng Wang, Yuheng Jiao, Hongfeng Zhan, Zhaoyang Li, Shihao Xu, Zhixiang Zhao, Tong Jia, Jinsong Lan, Xiaoyong Zhu, Bo Zheng

TL;DR

Pailitao-VL tackles real-time, high-precision multi-modal industrial search by replacing contrastive embeddings with absolute ID-based anchors learned from a billion-scale semantic prototype library, and by replacing isolated pointwise reranking with a hybrid, chunkwise listwise policy that combines local comparative reasoning with globally calibrated absolute relevance. The embedding component uses a three-stage pipeline (MLLM2vec backbone, global prototype head, and end-to-end optimization with additive angular margin) to achieve instance-level discrimination. The reranking component evolves from pointwise to listwise, introducing chunkwise local ranking, an absolute relevance scorer, and a hybrid merging strategy that yields high precision while maintaining low latency. Online deployment demonstrates strong latency and significant GMV gains, illustrating the practical viability and business value of the approach for large-scale industrial search tasks.

Abstract

In this work, we presented Pailitao-VL, a comprehensive multi-modal retrieval system engineered for high-precision, real-time industrial search. We here address three critical challenges in the current SOTA solution: insufficient retrieval granularity, vulnerability to environmental noise, and prohibitive efficiency-performance gap. Our primary contribution lies in two fundamental paradigm shifts. First, we transitioned the embedding paradigm from traditional contrastive learning to an absolute ID-recognition task. Through anchoring instances to a globally consistent latent space defined by billions of semantic prototypes, we successfully overcome the stochasticity and granularity bottlenecks inherent in existing embedding solutions. Second, we evolved the generative reranker from isolated pointwise evaluation to the compare-and-calibrate listwise policy. By synergizing chunk-based comparative reasoning with calibrated absolute relevance scoring, the system achieves nuanced discriminative resolution while circumventing the prohibitive latency typically associated with conventional reranking methods. Extensive offline benchmarks and online A/B tests on Alibaba e-commerce platform confirm that Pailitao-VL achieves state-of-the-art performance and delivers substantial business impact. This work demonstrates a robust and scalable path for deploying advanced MLLM-based retrieval architectures in demanding, large-scale production environments.

Pailitao-VL: Unified Embedding and Reranker for Real-Time Multi-Modal Industrial Search

TL;DR

Pailitao-VL tackles real-time, high-precision multi-modal industrial search by replacing contrastive embeddings with absolute ID-based anchors learned from a billion-scale semantic prototype library, and by replacing isolated pointwise reranking with a hybrid, chunkwise listwise policy that combines local comparative reasoning with globally calibrated absolute relevance. The embedding component uses a three-stage pipeline (MLLM2vec backbone, global prototype head, and end-to-end optimization with additive angular margin) to achieve instance-level discrimination. The reranking component evolves from pointwise to listwise, introducing chunkwise local ranking, an absolute relevance scorer, and a hybrid merging strategy that yields high precision while maintaining low latency. Online deployment demonstrates strong latency and significant GMV gains, illustrating the practical viability and business value of the approach for large-scale industrial search tasks.

Abstract

In this work, we presented Pailitao-VL, a comprehensive multi-modal retrieval system engineered for high-precision, real-time industrial search. We here address three critical challenges in the current SOTA solution: insufficient retrieval granularity, vulnerability to environmental noise, and prohibitive efficiency-performance gap. Our primary contribution lies in two fundamental paradigm shifts. First, we transitioned the embedding paradigm from traditional contrastive learning to an absolute ID-recognition task. Through anchoring instances to a globally consistent latent space defined by billions of semantic prototypes, we successfully overcome the stochasticity and granularity bottlenecks inherent in existing embedding solutions. Second, we evolved the generative reranker from isolated pointwise evaluation to the compare-and-calibrate listwise policy. By synergizing chunk-based comparative reasoning with calibrated absolute relevance scoring, the system achieves nuanced discriminative resolution while circumventing the prohibitive latency typically associated with conventional reranking methods. Extensive offline benchmarks and online A/B tests on Alibaba e-commerce platform confirm that Pailitao-VL achieves state-of-the-art performance and delivers substantial business impact. This work demonstrates a robust and scalable path for deploying advanced MLLM-based retrieval architectures in demanding, large-scale production environments.
Paper Structure (23 sections, 20 equations, 2 figures, 5 tables)

This paper contains 23 sections, 20 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Pailitao-VL-Embedding evolves from popular contrastive learning to absolute prototype-ID recognition, enhancing instance-level discriminative capability through three-stage optimization.
  • Figure 2: Pailitao-VL-Reranker evolves from pointwise (Pailitao-VL-Reranker-Point) to listwise (Pailitao-VL-Reranker-List), becoming more interpretable, efficient, and performing better.