MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization

Tao Chen; Ze Lin; Hui Li; Jiayi Ji; Yiyi Zhou; Guanbin Li; Rongrong Ji

MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization

Tao Chen, Ze Lin, Hui Li, Jiayi Ji, Yiyi Zhou, Guanbin Li, Rongrong Ji

TL;DR

This work proposes an end-to-end multi-grained multi-modal attribute-aware product summarization method (MMAPS) for generating high-quality product summaries in e-commerce and designs several multi-grained multi-modal tasks to better guide the multi-modal learning of MMAPS.

Abstract

Given the long textual product information and the product image, Multi-modal Product Summarization (MPS) aims to increase customers' desire to purchase by highlighting product characteristics with a short textual summary. Existing MPS methods can produce promising results. Nevertheless, they still 1) lack end-to-end product summarization, 2) lack multi-grained multi-modal modeling, and 3) lack multi-modal attribute modeling. To improve MPS, we propose an end-to-end multi-grained multi-modal attribute-aware product summarization method (MMAPS) for generating high-quality product summaries in e-commerce. MMAPS jointly models product attributes and generates product summaries. We design several multi-grained multi-modal tasks to better guide the multi-modal learning of MMAPS. Furthermore, we model product attributes based on both text and image modalities so that multi-modal product characteristics can be manifested in the generated summaries. Extensive experiments on a real large-scale Chinese e-commence dataset demonstrate that our model outperforms state-of-the-art product summarization methods w.r.t. several summarization metrics. Our code is publicly available at: https://github.com/KDEGroup/MMAPS.

MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization

TL;DR

Abstract

Paper Structure (21 sections, 7 equations, 4 figures, 3 tables)

This paper contains 21 sections, 7 equations, 4 figures, 3 tables.

Introduction
Related Work
Product Summarization
Multi-Modal Self-Supervised Learning
Our Method MMAPS
Architecture Design
Text Encoding
Image Encoding
Text-Image Fusion
Multi-Modal Multi-task Learning
Product Summarization
Masked Region Modeling
Multi-grained Multi-modal Modeling
Putting All Together
Experiment
...and 6 more sections

Figures (4)

Figure 1: An example of a product in the CEPSUM dataset LiYXWHZ20. Product attributes are shown in red.
Figure 2: Overview of MMAPS. Chinese product information has been translated into English.
Figure 3: A comparison between the product summaries generated by MMAPS and V2P. GT indicates ground truth. The English texts are translated from the corresponding Chinese texts. The same or semantically similar descriptions are highlighted in red. The product appearances manifested in the image are underlined.
Figure 4: Sensitivity analysis of task weights on Cases & Bags.

MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization

TL;DR

Abstract

MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization

Authors

TL;DR

Abstract

Table of Contents

Figures (4)