eC-Tab2Text: Aspect-Based Text Generation from e-Commerce Product Tables
Luis Antonio Gutiérrez Guanilo, Mir Tafseer Nayeem, Cristian López, Davood Rafiei
TL;DR
This work introduces eC-Tab2Text, a domain-specific dataset for generating attribute-focused product descriptions from e-commerce tables, addressing the gap left by general-purpose table-to-text resources. It gathers price-and-spec tables and expert reviews from Pricebaba, serializes them in JSON, and pairs them with user-style queries to enable aspect-based generation. By fine-tuning open-source LLMs (LLaMA 2-Chat, Mistral 7B-Instruct, and StructLM 7B) on eC-Tab2Text, the study demonstrates substantial gains across standard text-generation metrics and competitive correctness and faithfulness compared to closed models, with robustness evaluations against out-of-domain QTSumm data. The findings highlight the importance of domain-specific datasets for industry-specific language generation and outline future directions in numerical reasoning and broader domain coverage.
Abstract
Large Language Models (LLMs) have demonstrated exceptional versatility across diverse domains, yet their application in e-commerce remains underexplored due to a lack of domain-specific datasets. To address this gap, we introduce eC-Tab2Text, a novel dataset designed to capture the intricacies of e-commerce, including detailed product attributes and user-specific queries. Leveraging eC-Tab2Text, we focus on text generation from product tables, enabling LLMs to produce high-quality, attribute-specific product reviews from structured tabular data. Fine-tuned models were rigorously evaluated using standard Table2Text metrics, alongside correctness, faithfulness, and fluency assessments. Our results demonstrate substantial improvements in generating contextually accurate reviews, highlighting the transformative potential of tailored datasets and fine-tuning methodologies in optimizing e-commerce workflows. This work highlights the potential of LLMs in e-commerce workflows and the essential role of domain-specific datasets in tailoring them to industry-specific challenges.
