MMPKUBase: A Comprehensive and High-quality Chinese Multi-modal Knowledge Graph

Xuan Yi; Yanzeng Li; Lei Zou

MMPKUBase: A Comprehensive and High-quality Chinese Multi-modal Knowledge Graph

Xuan Yi, Yanzeng Li, Lei Zou

TL;DR

MMPKUBase addresses the scarcity of high-quality Chinese multi-modal knowledge graphs by constructing a large-scale Chinese KG enriched with images across nine domains. The method combines Prototypical Contrastive Learning for robust image features with Isolation Forest-based filtering to ensure quality, resulting in 1,227,013 high-quality images associated with 52,180 entities. The framework formalizes a multi-modal KG $ ext{G}_{mm}$, leverages two data sources (PKUBase and Baidu Image), and completes the data with RDF-based triples exposed via a user-friendly demonstration platform. This work provides a valuable resource for vision-language tasks and cross-modal reasoning in Chinese, with potential applications in visual question answering and recommendation. It also lays the groundwork for expanding domain coverage and deep integration into real-world systems.

Abstract

Multi-modal knowledge graphs have emerged as a powerful approach for information representation, combining data from different modalities such as text, images, and videos. While several such graphs have been constructed and have played important roles in applications like visual question answering and recommendation systems, challenges persist in their development. These include the scarcity of high-quality Chinese knowledge graphs and limited domain coverage in existing multi-modal knowledge graphs. This paper introduces MMPKUBase, a robust and extensive Chinese multi-modal knowledge graph that covers diverse domains, including birds, mammals, ferns, and more, comprising over 50,000 entities and over 1 million filtered images. To ensure data quality, we employ Prototypical Contrastive Learning and the Isolation Forest algorithm to refine the image data. Additionally, we have developed a user-friendly platform to facilitate image attribute exploration.

MMPKUBase: A Comprehensive and High-quality Chinese Multi-modal Knowledge Graph

TL;DR

, leverages two data sources (PKUBase and Baidu Image), and completes the data with RDF-based triples exposed via a user-friendly demonstration platform. This work provides a valuable resource for vision-language tasks and cross-modal reasoning in Chinese, with potential applications in visual question answering and recommendation. It also lays the groundwork for expanding domain coverage and deep integration into real-world systems.

Abstract

Paper Structure (16 sections, 3 figures, 1 table)

This paper contains 16 sections, 3 figures, 1 table.

Introduction
Related Work
Method
Definition
Framework Overview
Data Acquisition
Data Sources
Entity Selection
Image Retrieval
Image Filtering
PCL Feature Generation
Image Selection
Triple Completion
Statistics
Demonstration
...and 1 more sections

Figures (3)

Figure 1: The overview construction pipeline of MMPKUBase
Figure 2: The results of filtering a collection of images for a specific architectural entity using the Isolation Forest method.
Figure 3: Query examples from the demonstration platform. The SPARQL query is designed to locate entities whose names contain the substring 'BMW' and retrieve their associated image attributes.

MMPKUBase: A Comprehensive and High-quality Chinese Multi-modal Knowledge Graph

TL;DR

Abstract

MMPKUBase: A Comprehensive and High-quality Chinese Multi-modal Knowledge Graph

Authors

TL;DR

Abstract

Table of Contents

Figures (3)