Table of Contents
Fetching ...

Auto prompting without training labels: An LLM cascade for product quality assessment in e-commerce catalogs

Soham Satyadharma, Fatemeh Sheikholeslami, Swati Kaul, Aziz Umit Batur, Suleiman A. Khan

TL;DR

The paper tackles scalable product-catalog quality assessment without training labels by introducing a training-free auto-prompt cascade that automatically generates and refines PC-SA instructions. It bootstraps from a small set of human prompts and iteratively creates domain-specific instructions to guide LLMs in correctness and applicability tasks across tens of thousands of PC-SA pairs and multiple languages. The approach achieves consistent 8–10% gains in precision/recall and a 99% reduction in expert prompting effort, while generalizing across languages and several foundational LLMs. This work demonstrates a practical, scalable method for domain-adapted prompt synthesis in e-commerce, with potential extensions to other catalog tasks and a need for intrinsic instruction-quality metrics.

Abstract

We introduce a novel, training free cascade for auto-prompting Large Language Models (LLMs) to assess product quality in e-commerce. Our system requires no training labels or model fine-tuning, instead automatically generating and refining prompts for evaluating attribute quality across tens of thousands of product category-attribute pairs. Starting from a seed of human-crafted prompts, the cascade progressively optimizes instructions to meet catalog-specific requirements. This approach bridges the gap between general language understanding and domain-specific knowledge at scale in complex industrial catalogs. Our extensive empirical evaluations shows the auto-prompt cascade improves precision and recall by $8-10\%$ over traditional chain-of-thought prompting. Notably, it achieves these gains while reducing domain expert effort from 5.1 hours to 3 minutes per attribute - a $99\%$ reduction. Additionally, the cascade generalizes effectively across five languages and multiple quality assessment tasks, consistently maintaining performance gains.

Auto prompting without training labels: An LLM cascade for product quality assessment in e-commerce catalogs

TL;DR

The paper tackles scalable product-catalog quality assessment without training labels by introducing a training-free auto-prompt cascade that automatically generates and refines PC-SA instructions. It bootstraps from a small set of human prompts and iteratively creates domain-specific instructions to guide LLMs in correctness and applicability tasks across tens of thousands of PC-SA pairs and multiple languages. The approach achieves consistent 8–10% gains in precision/recall and a 99% reduction in expert prompting effort, while generalizing across languages and several foundational LLMs. This work demonstrates a practical, scalable method for domain-adapted prompt synthesis in e-commerce, with potential extensions to other catalog tasks and a need for intrinsic instruction-quality metrics.

Abstract

We introduce a novel, training free cascade for auto-prompting Large Language Models (LLMs) to assess product quality in e-commerce. Our system requires no training labels or model fine-tuning, instead automatically generating and refining prompts for evaluating attribute quality across tens of thousands of product category-attribute pairs. Starting from a seed of human-crafted prompts, the cascade progressively optimizes instructions to meet catalog-specific requirements. This approach bridges the gap between general language understanding and domain-specific knowledge at scale in complex industrial catalogs. Our extensive empirical evaluations shows the auto-prompt cascade improves precision and recall by over traditional chain-of-thought prompting. Notably, it achieves these gains while reducing domain expert effort from 5.1 hours to 3 minutes per attribute - a reduction. Additionally, the cascade generalizes effectively across five languages and multiple quality assessment tasks, consistently maintaining performance gains.

Paper Structure

This paper contains 25 sections, 6 equations, 8 figures, 9 tables, 1 algorithm.

Figures (8)

  • Figure 1: Auto-prompt LLM cascade for quality measurement. The cascade takes metadata and manually created few-shot examples to iteratively generate prompts, subsequently used to classify the attribute value.
  • Figure 2: Example of auto prompting cascade for instruction generation of PC denoted as $p_a$ and attribute as $s_b$ across two iterations. Iteration 1 utilizes $M$ manually created global PC-SA instructions and is repeated for $M$ PC definitions to generate $M$ PC-SA instructions for the $s_b$ attribute that are used by iteration 2 to combine these examples with the definition for $p_a$ and $s_b$ to produce the PC-SA instruction for $p_a$ - $s_b$.
  • Figure 3: F1 score comparison on the incorrect class of three LLMs on the correctness task using Mixtral 8x7B, Mixtral 8x22B, DeepSeek R1 Distill Qwen 32B, and Claude 3.5 Sonnet. The top and bottom rows show F1 scores across the number of iterations and few shot examples respectively. Iteration 0 denotes performance of the CoT prompting. The dotted line represents best iteration across at least three of the four LLMs.
  • Figure 4: Baseline prompt template for the attribute base material.
  • Figure 5: CoT prompt template for the product category walking stick and attribute base material.
  • ...and 3 more figures