Cross-Platform E-Commerce Product Categorization and Recategorization: A Multimodal Hierarchical Classification Approach

Lotte Gross; Rebecca Walter; Nicole Zoppi; Adrien Justus; Alessandro Gambetti; Qiwei Han; Maximilian Kaiser

Cross-Platform E-Commerce Product Categorization and Recategorization: A Multimodal Hierarchical Classification Approach

Lotte Gross, Rebecca Walter, Nicole Zoppi, Adrien Justus, Alessandro Gambetti, Qiwei Han, Maximilian Kaiser

TL;DR

The paper tackles cross-platform e-commerce product categorization by developing a multimodal hierarchical framework that fuses textual, visual, and vision-language signals, coupled with dynamic masking to maintain taxonomic validity. It demonstrates that CLIP-based late fusion delivers the strongest hierarchical performance while a two-stage deployment (RoBERTa followed by a GPU-accelerated multimodal stage) balances accuracy and cost for industrial use. A self-supervised recategorization pipeline using SimCLR, UMAP, and cascade clustering discovers fine-grained subcategories (e.g., within Shoes) and generalizes across platforms, reducing manual taxonomy maintenance. The work confirms the practicality of deploying scalable, robust, cross-platform categorization pipelines in production environments (EURWEB) and outlines a path for taxonomy evolution aligned with dynamic market trends.

Abstract

This study addresses critical industrial challenges in e-commerce product categorization, namely platform heterogeneity and the structural limitations of existing taxonomies, by developing and deploying a multimodal hierarchical classification framework. Using a dataset of 271,700 products from 40 international fashion e-commerce platforms, we integrate textual features (RoBERTa), visual features (ViT), and joint vision-language representations (CLIP). We investigate fusion strategies, including early, late, and attention-based fusion within a hierarchical architecture enhanced by dynamic masking to ensure taxonomic consistency. Results show that CLIP embeddings combined via an MLP-based late-fusion strategy achieve the highest hierarchical F1 (98.59%), outperforming unimodal baselines. To address shallow or inconsistent categories, we further introduce a self-supervised "product recategorization" pipeline using SimCLR, UMAP, and cascade clustering, which discovered new, fine-grained categories (for example, subtypes of "Shoes") with cluster purities above 86%. Cross-platform experiments reveal a deployment-relevant trade-off: complex late-fusion methods maximize accuracy with diverse training data, while simpler early-fusion methods generalize more effectively to unseen platforms. Finally, we demonstrate the framework's industrial scalability through deployment in EURWEB's commercial transaction intelligence platform via a two-stage inference pipeline, combining a lightweight RoBERTa stage with a GPU-accelerated multimodal stage to balance cost and accuracy.

Cross-Platform E-Commerce Product Categorization and Recategorization: A Multimodal Hierarchical Classification Approach

TL;DR

Abstract

Cross-Platform E-Commerce Product Categorization and Recategorization: A Multimodal Hierarchical Classification Approach

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)