Table of Contents
Fetching ...

Overview of the VLSP 2023 -- ComOM Shared Task: A Data Challenge for Comparative Opinion Mining from Vietnamese Product Reviews

Hoang-Quynh Le, Duy-Cat Can, Khanh-Vinh Nguyen, Mai-Vu Tran

TL;DR

The paper introduces the VLSP 2023 ComOM shared task, targeting Vietnamese comparative opinion mining by requiring extraction of a quintuple (S,O,A,P,L) from product reviews. It presents the VCOM corpus, a human-annotated dataset of 120 documents with 2468 comparisons across 9225 sentences, and defines evaluation via exact-match macro-averaged quintuple F1. Two baselines—Pipeline (LSTM-CRF-based extraction and validation) and Generative (BERT-based encoder–decoder)—are provided, with detailed post-processing and formatting rules. Results show public-set performance favoring structured pipelines, while a more diverse private-set favored generative approaches, highlighting complementary strengths and establishing benchmarks for Vietnamese comparative opinion mining. The work contributes a substantial dataset, evaluation framework, and benchmark results to advance research in Vietnamese NLP and opinion mining.

Abstract

This paper presents a comprehensive overview of the Comparative Opinion Mining from Vietnamese Product Reviews shared task (ComOM), held as part of the 10$^{th}$ International Workshop on Vietnamese Language and Speech Processing (VLSP 2023). The primary objective of this shared task is to advance the field of natural language processing by developing techniques that proficiently extract comparative opinions from Vietnamese product reviews. Participants are challenged to propose models that adeptly extract a comparative "quintuple" from a comparative sentence, encompassing Subject, Object, Aspect, Predicate, and Comparison Type Label. We construct a human-annotated dataset comprising $120$ documents, encompassing $7427$ non-comparative sentences and $2468$ comparisons within $1798$ sentences. Participating models undergo evaluation and ranking based on the Exact match macro-averaged quintuple F1 score.

Overview of the VLSP 2023 -- ComOM Shared Task: A Data Challenge for Comparative Opinion Mining from Vietnamese Product Reviews

TL;DR

The paper introduces the VLSP 2023 ComOM shared task, targeting Vietnamese comparative opinion mining by requiring extraction of a quintuple (S,O,A,P,L) from product reviews. It presents the VCOM corpus, a human-annotated dataset of 120 documents with 2468 comparisons across 9225 sentences, and defines evaluation via exact-match macro-averaged quintuple F1. Two baselines—Pipeline (LSTM-CRF-based extraction and validation) and Generative (BERT-based encoder–decoder)—are provided, with detailed post-processing and formatting rules. Results show public-set performance favoring structured pipelines, while a more diverse private-set favored generative approaches, highlighting complementary strengths and establishing benchmarks for Vietnamese comparative opinion mining. The work contributes a substantial dataset, evaluation framework, and benchmark results to advance research in Vietnamese NLP and opinion mining.

Abstract

This paper presents a comprehensive overview of the Comparative Opinion Mining from Vietnamese Product Reviews shared task (ComOM), held as part of the 10 International Workshop on Vietnamese Language and Speech Processing (VLSP 2023). The primary objective of this shared task is to advance the field of natural language processing by developing techniques that proficiently extract comparative opinions from Vietnamese product reviews. Participants are challenged to propose models that adeptly extract a comparative "quintuple" from a comparative sentence, encompassing Subject, Object, Aspect, Predicate, and Comparison Type Label. We construct a human-annotated dataset comprising documents, encompassing non-comparative sentences and comparisons within sentences. Participating models undergo evaluation and ranking based on the Exact match macro-averaged quintuple F1 score.
Paper Structure (19 sections, 3 figures, 2 tables)

This paper contains 19 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Multi-comparison statistic.
  • Figure 2: Comparison type labels statistic.
  • Figure 3: Overall architecture of baseline model