Overview of the VLSP 2023 -- ComOM Shared Task: A Data Challenge for Comparative Opinion Mining from Vietnamese Product Reviews
Hoang-Quynh Le, Duy-Cat Can, Khanh-Vinh Nguyen, Mai-Vu Tran
TL;DR
The paper introduces the VLSP 2023 ComOM shared task, targeting Vietnamese comparative opinion mining by requiring extraction of a quintuple (S,O,A,P,L) from product reviews. It presents the VCOM corpus, a human-annotated dataset of 120 documents with 2468 comparisons across 9225 sentences, and defines evaluation via exact-match macro-averaged quintuple F1. Two baselines—Pipeline (LSTM-CRF-based extraction and validation) and Generative (BERT-based encoder–decoder)—are provided, with detailed post-processing and formatting rules. Results show public-set performance favoring structured pipelines, while a more diverse private-set favored generative approaches, highlighting complementary strengths and establishing benchmarks for Vietnamese comparative opinion mining. The work contributes a substantial dataset, evaluation framework, and benchmark results to advance research in Vietnamese NLP and opinion mining.
Abstract
This paper presents a comprehensive overview of the Comparative Opinion Mining from Vietnamese Product Reviews shared task (ComOM), held as part of the 10$^{th}$ International Workshop on Vietnamese Language and Speech Processing (VLSP 2023). The primary objective of this shared task is to advance the field of natural language processing by developing techniques that proficiently extract comparative opinions from Vietnamese product reviews. Participants are challenged to propose models that adeptly extract a comparative "quintuple" from a comparative sentence, encompassing Subject, Object, Aspect, Predicate, and Comparison Type Label. We construct a human-annotated dataset comprising $120$ documents, encompassing $7427$ non-comparative sentences and $2468$ comparisons within $1798$ sentences. Participating models undergo evaluation and ranking based on the Exact match macro-averaged quintuple F1 score.
