Unlocking Markets: A Multilingual Benchmark to Cross-Market Question Answering
Yifei Yuan, Yang Deng, Anders Søgaard, Mohammad Aliannejadi
TL;DR
The paper introduces Multilingual Cross-market Product-based Question Answering (MCPQA), a task that answers questions in a resource-scarce main marketplace by leveraging rich resources from a language-diverse auxiliary marketplace. It builds McMarket, a large-scale dataset with over 7 million questions across 17 marketplaces and 11 languages, and automatically translates non-English Electronics content to English to create McMarket, plus LLM-annotated subsets McMarket_r and McMarket_q. The study investigates two subtasks—AG (review-based answer generation) and QR (question ranking)—and benchmarks a spectrum of models from lexical baselines to LLMs in single-market and cross-market settings, demonstrating that cross-market information consistently boosts performance. The results highlight the value of cross-market transfer and multilingual resources for improving product QA, while also providing insights into LLM-assisted data annotation and multilingual challenges. This work advances scalable, multilingual cross-market QA research with a practical, real-world dataset and thorough experimental evaluation.
Abstract
Users post numerous product-related questions on e-commerce platforms, affecting their purchase decisions. Product-related question answering (PQA) entails utilizing product-related resources to provide precise responses to users. We propose a novel task of Multilingual Cross-market Product-based Question Answering (MCPQA) and define the task as providing answers to product-related questions in a main marketplace by utilizing information from another resource-rich auxiliary marketplace in a multilingual context. We introduce a large-scale dataset comprising over 7 million questions from 17 marketplaces across 11 languages. We then perform automatic translation on the Electronics category of our dataset, naming it as McMarket. We focus on two subtasks: review-based answer generation and product-related question ranking. For each subtask, we label a subset of McMarket using an LLM and further evaluate the quality of the annotations via human assessment. We then conduct experiments to benchmark our dataset, using models ranging from traditional lexical models to LLMs in both single-market and cross-market scenarios across McMarket and the corresponding LLM subset. Results show that incorporating cross-market information significantly enhances performance in both tasks.
