Table of Contents
Fetching ...

eC-Tab2Text: Aspect-Based Text Generation from e-Commerce Product Tables

Luis Antonio Gutiérrez Guanilo, Mir Tafseer Nayeem, Cristian López, Davood Rafiei

TL;DR

This work introduces eC-Tab2Text, a domain-specific dataset for generating attribute-focused product descriptions from e-commerce tables, addressing the gap left by general-purpose table-to-text resources. It gathers price-and-spec tables and expert reviews from Pricebaba, serializes them in JSON, and pairs them with user-style queries to enable aspect-based generation. By fine-tuning open-source LLMs (LLaMA 2-Chat, Mistral 7B-Instruct, and StructLM 7B) on eC-Tab2Text, the study demonstrates substantial gains across standard text-generation metrics and competitive correctness and faithfulness compared to closed models, with robustness evaluations against out-of-domain QTSumm data. The findings highlight the importance of domain-specific datasets for industry-specific language generation and outline future directions in numerical reasoning and broader domain coverage.

Abstract

Large Language Models (LLMs) have demonstrated exceptional versatility across diverse domains, yet their application in e-commerce remains underexplored due to a lack of domain-specific datasets. To address this gap, we introduce eC-Tab2Text, a novel dataset designed to capture the intricacies of e-commerce, including detailed product attributes and user-specific queries. Leveraging eC-Tab2Text, we focus on text generation from product tables, enabling LLMs to produce high-quality, attribute-specific product reviews from structured tabular data. Fine-tuned models were rigorously evaluated using standard Table2Text metrics, alongside correctness, faithfulness, and fluency assessments. Our results demonstrate substantial improvements in generating contextually accurate reviews, highlighting the transformative potential of tailored datasets and fine-tuning methodologies in optimizing e-commerce workflows. This work highlights the potential of LLMs in e-commerce workflows and the essential role of domain-specific datasets in tailoring them to industry-specific challenges.

eC-Tab2Text: Aspect-Based Text Generation from e-Commerce Product Tables

TL;DR

This work introduces eC-Tab2Text, a domain-specific dataset for generating attribute-focused product descriptions from e-commerce tables, addressing the gap left by general-purpose table-to-text resources. It gathers price-and-spec tables and expert reviews from Pricebaba, serializes them in JSON, and pairs them with user-style queries to enable aspect-based generation. By fine-tuning open-source LLMs (LLaMA 2-Chat, Mistral 7B-Instruct, and StructLM 7B) on eC-Tab2Text, the study demonstrates substantial gains across standard text-generation metrics and competitive correctness and faithfulness compared to closed models, with robustness evaluations against out-of-domain QTSumm data. The findings highlight the importance of domain-specific datasets for industry-specific language generation and outline future directions in numerical reasoning and broader domain coverage.

Abstract

Large Language Models (LLMs) have demonstrated exceptional versatility across diverse domains, yet their application in e-commerce remains underexplored due to a lack of domain-specific datasets. To address this gap, we introduce eC-Tab2Text, a novel dataset designed to capture the intricacies of e-commerce, including detailed product attributes and user-specific queries. Leveraging eC-Tab2Text, we focus on text generation from product tables, enabling LLMs to produce high-quality, attribute-specific product reviews from structured tabular data. Fine-tuned models were rigorously evaluated using standard Table2Text metrics, alongside correctness, faithfulness, and fluency assessments. Our results demonstrate substantial improvements in generating contextually accurate reviews, highlighting the transformative potential of tailored datasets and fine-tuning methodologies in optimizing e-commerce workflows. This work highlights the potential of LLMs in e-commerce workflows and the essential role of domain-specific datasets in tailoring them to industry-specific challenges.

Paper Structure

This paper contains 30 sections, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Overview of eC-Tab2Text. Illustration of aspect-based text generation from e-commerce product tables, where an LLM generates summaries for user-specific aspects like "Camera" and "Design & Display."
  • Figure 2: Data collection pipeline for our eC-Tab2Text dataset.
  • Figure 3: An illustration of sample output texts generated for user-specific queries based on structured input from product tables.
  • Figure 4: An example of a product specification table structure.