Optimizing Product Deduplication in E-Commerce with Multimodal Embeddings

Aysenur Kulunk; Berk Taskin; M. Furkan Eseoglu; H. Bahadir Sahin

Optimizing Product Deduplication in E-Commerce with Multimodal Embeddings

Aysenur Kulunk, Berk Taskin, M. Furkan Eseoglu, H. Bahadir Sahin

TL;DR

The paper tackles duplicate product listings in large Turkish e-commerce by building a domain-specific, multimodal deduplication system. It combines a Turkish BERTurk-based text encoder with a Masked AutoEncoder–based image encoder to produce compact 128-dimensional embeddings, and uses a dedicated decider that fuses text and image vectors for fast, category-agnostic classification. Through Milvus vector search with IVF_FLAT indexing, the approach achieves a macro-F1 of 0.90, outperforming a strong third-party baseline, while maintaining low memory and latency suitable for hundreds of millions of items. The work demonstrates scalable, efficient, and accurate deduplication with potential deployment at-scale (116M product vectors daily) and outlines future directions for richer multimodal integration and language expansion.

Abstract

In large scale e-commerce marketplaces, duplicate product listings frequently cause consumer confusion and operational inefficiencies, degrading trust on the platform and increasing costs. Traditional keyword-based search methodologies falter in accurately identifying duplicates due to their reliance on exact textual matches, neglecting semantic similarities inherent in product titles. To address these challenges, we introduce a scalable, multimodal product deduplication designed specifically for the e-commerce domain. Our approach employs a domain-specific text model grounded in BERT architecture in conjunction with MaskedAutoEncoders for image representations. Both of these architectures are augmented with dimensionality reduction techniques to produce compact 128-dimensional embeddings without significant information loss. Complementing this, we also developed a novel decider model that leverages both text and image vectors. By integrating these feature extraction mechanisms with Milvus, an optimized vector database, our system can facilitate efficient and high-precision similarity searches across extensive product catalogs exceeding 200 million items with just 100GB of system RAM consumption. Empirical evaluations demonstrate that our matching system achieves a macro-average F1 score of 0.90, outperforming third-party solutions which attain an F1 score of 0.83. Our findings show the potential of combining domain-specific adaptations with state-of-the-art machine learning techniques to mitigate duplicate listings in large-scale e-commerce environments.

Optimizing Product Deduplication in E-Commerce with Multimodal Embeddings

TL;DR

Abstract

Optimizing Product Deduplication in E-Commerce with Multimodal Embeddings

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)