Table of Contents
Fetching ...

Question-to-Knowledge (Q2K): Multi-Agent Generation of Inspectable Facts for Product Mapping

Wonduk Seo, Taesub Shin, Hyunjin An, Dokyun Kim, Seunghyun Lee

TL;DR

The paper tackles the persistent SKU mapping challenge in e-commerce by introducing Question-to-Knowledge (Q2K), a multi-agent framework that decomposes mapping into targeted disambiguation questions answered through focused web evidence. It combines a Reasoning Agent, Knowledge Agent, and Deduplication Agent, plus a human-in-the-loop, to produce transparent, evidence-grounded mappings and reusable reasoning traces. Empirical results on a large, semi-manually collected dataset show that Q2K achieves state-of-the-art accuracy (0.9562) while reducing redundant searches via trace reuse, demonstrating both robustness and efficiency. The approach offers scalable, interpretable product integration across heterogeneous platforms and lays groundwork for retrieval-augmented reasoning in catalog management. This framework balances accuracy, cost, and transparency, making it practical for real-world deployment in dynamic e-commerce environments.

Abstract

Identifying whether two product listings refer to the same Stock Keeping Unit (SKU) is a persistent challenge in ecommerce, especially when explicit identifiers are missing and product names vary widely across platforms. Rule based heuristics and keyword similarity often misclassify products by overlooking subtle distinctions in brand, specification, or bundle configuration. To overcome these limitations, we propose Question to Knowledge (Q2K), a multi agent framework that leverages Large Language Models (LLMs) for reliable SKU mapping. Q2K integrates: (1) a Reasoning Agent that generates targeted disambiguation questions, (2) a Knowledge Agent that resolves them via focused web searches, and (3) a Deduplication Agent that reuses validated reasoning traces to reduce redundancy and ensure consistency. A human in the loop mechanism further refines uncertain cases. Experiments on real world consumer goods datasets show that Q2K surpasses strong baselines, achieving higher accuracy and robustness in difficult scenarios such as bundle identification and brand origin disambiguation. By reusing retrieved reasoning instead of issuing repeated searches, Q2K balances accuracy with efficiency, offering a scalable and interpretable solution for product integration.

Question-to-Knowledge (Q2K): Multi-Agent Generation of Inspectable Facts for Product Mapping

TL;DR

The paper tackles the persistent SKU mapping challenge in e-commerce by introducing Question-to-Knowledge (Q2K), a multi-agent framework that decomposes mapping into targeted disambiguation questions answered through focused web evidence. It combines a Reasoning Agent, Knowledge Agent, and Deduplication Agent, plus a human-in-the-loop, to produce transparent, evidence-grounded mappings and reusable reasoning traces. Empirical results on a large, semi-manually collected dataset show that Q2K achieves state-of-the-art accuracy (0.9562) while reducing redundant searches via trace reuse, demonstrating both robustness and efficiency. The approach offers scalable, interpretable product integration across heterogeneous platforms and lays groundwork for retrieval-augmented reasoning in catalog management. This framework balances accuracy, cost, and transparency, making it practical for real-world deployment in dynamic e-commerce environments.

Abstract

Identifying whether two product listings refer to the same Stock Keeping Unit (SKU) is a persistent challenge in ecommerce, especially when explicit identifiers are missing and product names vary widely across platforms. Rule based heuristics and keyword similarity often misclassify products by overlooking subtle distinctions in brand, specification, or bundle configuration. To overcome these limitations, we propose Question to Knowledge (Q2K), a multi agent framework that leverages Large Language Models (LLMs) for reliable SKU mapping. Q2K integrates: (1) a Reasoning Agent that generates targeted disambiguation questions, (2) a Knowledge Agent that resolves them via focused web searches, and (3) a Deduplication Agent that reuses validated reasoning traces to reduce redundancy and ensure consistency. A human in the loop mechanism further refines uncertain cases. Experiments on real world consumer goods datasets show that Q2K surpasses strong baselines, achieving higher accuracy and robustness in difficult scenarios such as bundle identification and brand origin disambiguation. By reusing retrieved reasoning instead of issuing repeated searches, Q2K balances accuracy with efficiency, offering a scalable and interpretable solution for product integration.

Paper Structure

This paper contains 27 sections, 11 equations, 1 figure, 4 tables.

Figures (1)

  • Figure 1: Overview of the Q2K workflow. The Reasoning Agent generates disambiguation questions from a product-pair, the Deduplication Agent retrieves top-k reasoning traces and checks information gain, and the Knowledge Agent provides authoritative answers when evidence is insufficient.