FITRep: Attention-Guided Item Representation via MLLMs
Guoxiao Zhang, Ao Li, Tan Qu, Qianlong Xie, Xingxing Wang
TL;DR
The paper tackles near-duplicate items arising from multimodal content by addressing the failure of black-box representations to respect item structure. It introduces FITRep, a white-box, attention-guided framework with three components: CHIE for hierarchical concept extraction using MLLMs, SPDR for structure-preserving dimensionality reduction via adaptive UMAP, and FBC for scalable FAISS-based clustering that weights elements by attention. Empirical results show superior offline duplicate detection (precision $88.1\%$, F1 $87.8\%$), improvements in CTR prediction (AUC $0.664$) and substantial online gains in CTR ($+3.60\%$) and CPM ($+4.25\%$) in Meituan’s system. The work demonstrates the practical impact of interpretable, fine-grained multimodal representations for large-scale recommendation and deduplication tasks.
Abstract
Online platforms usually suffer from user experience degradation due to near-duplicate items with similar visuals and text. While Multimodal Large Language Models (MLLMs) enable multimodal embedding, existing methods treat representations as black boxes, ignoring structural relationships (e.g., primary vs. auxiliary elements), leading to local structural collapse problem. To address this, inspired by Feature Integration Theory (FIT), we propose FITRep, the first attention-guided, white-box item representation framework for fine-grained item deduplication. FITRep consists of: (1) Concept Hierarchical Information Extraction (CHIE), using MLLMs to extract hierarchical semantic concepts; (2) Structure-Preserving Dimensionality Reduction (SPDR), an adaptive UMAP-based method for efficient information compression; and (3) FAISS-Based Clustering (FBC), a FAISS-based clustering that assigns each item a unique cluster id using FAISS. Deployed on Meituan's advertising system, FITRep achieves +3.60% CTR and +4.25% CPM gains in online A/B tests, demonstrating both effectiveness and real-world impact.
