Table of Contents
Fetching ...

Hierarchical Knowledge Graph Construction from Images for Scalable E-Commerce

Zhantao Yang, Han Zhang, Fangyi Chen, Anudeepsekhar Bolimera, Marios Savvides

TL;DR

This paper proposes a novel method for constructing structured product knowledge graphs from raw product images that cooperatively leverages recent advances in the vision-language model (VLM) and large language model (LLM), fully automating the process and allowing timely graph updates.

Abstract

Knowledge Graph (KG) is playing an increasingly important role in various AI systems. For e-commerce, an efficient and low-cost automated knowledge graph construction method is the foundation of enabling various successful downstream applications. In this paper, we propose a novel method for constructing structured product knowledge graphs from raw product images. The method cooperatively leverages recent advances in the vision-language model (VLM) and large language model (LLM), fully automating the process and allowing timely graph updates. We also present a human-annotated e-commerce product dataset for benchmarking product property extraction in knowledge graph construction. Our method outperforms our baseline in all metrics and evaluated properties, demonstrating its effectiveness and bright usage potential.

Hierarchical Knowledge Graph Construction from Images for Scalable E-Commerce

TL;DR

This paper proposes a novel method for constructing structured product knowledge graphs from raw product images that cooperatively leverages recent advances in the vision-language model (VLM) and large language model (LLM), fully automating the process and allowing timely graph updates.

Abstract

Knowledge Graph (KG) is playing an increasingly important role in various AI systems. For e-commerce, an efficient and low-cost automated knowledge graph construction method is the foundation of enabling various successful downstream applications. In this paper, we propose a novel method for constructing structured product knowledge graphs from raw product images. The method cooperatively leverages recent advances in the vision-language model (VLM) and large language model (LLM), fully automating the process and allowing timely graph updates. We also present a human-annotated e-commerce product dataset for benchmarking product property extraction in knowledge graph construction. Our method outperforms our baseline in all metrics and evaluated properties, demonstrating its effectiveness and bright usage potential.

Paper Structure

This paper contains 16 sections, 2 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Method Overview. Stage 1: An empty graph is first initialized with target properties and corresponding data types. Stage2: for each product, information is extracted with VLMs, organized and improved by LLM.
  • Figure 2: Time taken to generate KG for an image remains similar as the number of images in the inventory increases.
  • Figure 3: Example KG subgraph of 3 enrolled products.