Table of Contents
Fetching ...

MMPKUBase: A Comprehensive and High-quality Chinese Multi-modal Knowledge Graph

Xuan Yi, Yanzeng Li, Lei Zou

TL;DR

MMPKUBase addresses the scarcity of high-quality Chinese multi-modal knowledge graphs by constructing a large-scale Chinese KG enriched with images across nine domains. The method combines Prototypical Contrastive Learning for robust image features with Isolation Forest-based filtering to ensure quality, resulting in 1,227,013 high-quality images associated with 52,180 entities. The framework formalizes a multi-modal KG $ ext{G}_{mm}$, leverages two data sources (PKUBase and Baidu Image), and completes the data with RDF-based triples exposed via a user-friendly demonstration platform. This work provides a valuable resource for vision-language tasks and cross-modal reasoning in Chinese, with potential applications in visual question answering and recommendation. It also lays the groundwork for expanding domain coverage and deep integration into real-world systems.

Abstract

Multi-modal knowledge graphs have emerged as a powerful approach for information representation, combining data from different modalities such as text, images, and videos. While several such graphs have been constructed and have played important roles in applications like visual question answering and recommendation systems, challenges persist in their development. These include the scarcity of high-quality Chinese knowledge graphs and limited domain coverage in existing multi-modal knowledge graphs. This paper introduces MMPKUBase, a robust and extensive Chinese multi-modal knowledge graph that covers diverse domains, including birds, mammals, ferns, and more, comprising over 50,000 entities and over 1 million filtered images. To ensure data quality, we employ Prototypical Contrastive Learning and the Isolation Forest algorithm to refine the image data. Additionally, we have developed a user-friendly platform to facilitate image attribute exploration.

MMPKUBase: A Comprehensive and High-quality Chinese Multi-modal Knowledge Graph

TL;DR

MMPKUBase addresses the scarcity of high-quality Chinese multi-modal knowledge graphs by constructing a large-scale Chinese KG enriched with images across nine domains. The method combines Prototypical Contrastive Learning for robust image features with Isolation Forest-based filtering to ensure quality, resulting in 1,227,013 high-quality images associated with 52,180 entities. The framework formalizes a multi-modal KG , leverages two data sources (PKUBase and Baidu Image), and completes the data with RDF-based triples exposed via a user-friendly demonstration platform. This work provides a valuable resource for vision-language tasks and cross-modal reasoning in Chinese, with potential applications in visual question answering and recommendation. It also lays the groundwork for expanding domain coverage and deep integration into real-world systems.

Abstract

Multi-modal knowledge graphs have emerged as a powerful approach for information representation, combining data from different modalities such as text, images, and videos. While several such graphs have been constructed and have played important roles in applications like visual question answering and recommendation systems, challenges persist in their development. These include the scarcity of high-quality Chinese knowledge graphs and limited domain coverage in existing multi-modal knowledge graphs. This paper introduces MMPKUBase, a robust and extensive Chinese multi-modal knowledge graph that covers diverse domains, including birds, mammals, ferns, and more, comprising over 50,000 entities and over 1 million filtered images. To ensure data quality, we employ Prototypical Contrastive Learning and the Isolation Forest algorithm to refine the image data. Additionally, we have developed a user-friendly platform to facilitate image attribute exploration.
Paper Structure (16 sections, 3 figures, 1 table)

This paper contains 16 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: The overview construction pipeline of MMPKUBase
  • Figure 2: The results of filtering a collection of images for a specific architectural entity using the Isolation Forest method.
  • Figure 3: Query examples from the demonstration platform. The SPARQL query is designed to locate entities whose names contain the substring 'BMW' and retrieve their associated image attributes.