Table of Contents
Fetching ...

Hedonic Prices and Quality Adjusted Price Indices Powered by AI

Patrick Bajari, Zhihao Cen, Victor Chernozhukov, Manoj Manukonda, Suhas Vijaykumar, Jin Wang, Ramon Huerta, Junbo Li, Ling Leng, George Monokroussos, Shan Wang

TL;DR

The paper tackles inflation measurement with high turnover, heterogeneous online goods by introducing AI-powered hedonic pricing that leverages unstructured text and image data to predict prices. It develops a multi-task neural network that produces a time-series of hedonic prices via a value embedding derived from BERT-based text and ResNet-50 image embeddings. The main contributions are (i) scalable, automatic generation of product characteristics from unstructured data, (ii) demonstration of high predictive accuracy ($R^2$ in hold-out $80\%$–$90\%$) and (iii) construction of Fisher hedonic price indices (FHPI) for apparel, showing quality-adjusted inflation closer to CPI than alternatives. The results suggest AI embeddings can power real-time, quality-adjusted price indices with practical advantages over traditional, manually curated hedonic models, and point to future work on better image utilization and explainability.The approach reframes hedonic pricing as a prediction problem: prices are regressed on product attributes $X_i$ derived from text and image embeddings, with the price function allowed to vary over time as $P_{it}=H_{it}=h_t(X_i)$. By using deep neural networks, the authors estimate $h_t$ nonlinearly and then quantify uncertainty via a hold-out linear stage on a fixed value embedding $V_i$, enabling standard inference. The empirical application to Amazon apparel demonstrates strong out-of-sample fit and yields FHPI that indicates a modest decline in apparel prices 2014–2019, in contrast to larger declines suggested by some online indices, while reducing chain-drift biases through long-horizon chaining. The study contributes to the literature by showing how AI-based embeddings can modernize hedonic price indices using electronic data, offering a scalable, transparent alternative to manual hedonic feature construction.

Abstract

We develop empirical models that efficiently process large amounts of unstructured product data (text, images, prices, quantities) to produce accurate hedonic price estimates and derived indices. To achieve this, we generate abstract product attributes (or ``features'') from descriptions and images using deep neural networks. These attributes are then used to estimate the hedonic price function. To demonstrate the effectiveness of this approach, we apply the models to Amazon's data for first-party apparel sales, and estimate hedonic prices. The resulting models have a very high out-of-sample predictive accuracy, with $R^2$ ranging from $80\%$ to $90\%$. Finally, we construct the AI-based hedonic Fisher price index, chained at the year-over-year frequency, and contrast it with the CPI and other electronic indices.

Hedonic Prices and Quality Adjusted Price Indices Powered by AI

TL;DR

The paper tackles inflation measurement with high turnover, heterogeneous online goods by introducing AI-powered hedonic pricing that leverages unstructured text and image data to predict prices. It develops a multi-task neural network that produces a time-series of hedonic prices via a value embedding derived from BERT-based text and ResNet-50 image embeddings. The main contributions are (i) scalable, automatic generation of product characteristics from unstructured data, (ii) demonstration of high predictive accuracy ($R^2$ in hold-out $80\%$–$90\%$) and (iii) construction of Fisher hedonic price indices (FHPI) for apparel, showing quality-adjusted inflation closer to CPI than alternatives. The results suggest AI embeddings can power real-time, quality-adjusted price indices with practical advantages over traditional, manually curated hedonic models, and point to future work on better image utilization and explainability.The approach reframes hedonic pricing as a prediction problem: prices are regressed on product attributes $X_i$ derived from text and image embeddings, with the price function allowed to vary over time as $P_{it}=H_{it}=h_t(X_i)$. By using deep neural networks, the authors estimate $h_t$ nonlinearly and then quantify uncertainty via a hold-out linear stage on a fixed value embedding $V_i$, enabling standard inference. The empirical application to Amazon apparel demonstrates strong out-of-sample fit and yields FHPI that indicates a modest decline in apparel prices 2014–2019, in contrast to larger declines suggested by some online indices, while reducing chain-drift biases through long-horizon chaining. The study contributes to the literature by showing how AI-based embeddings can modernize hedonic price indices using electronic data, offering a scalable, transparent alternative to manual hedonic feature construction.

Abstract

We develop empirical models that efficiently process large amounts of unstructured product data (text, images, prices, quantities) to produce accurate hedonic price estimates and derived indices. To achieve this, we generate abstract product attributes (or ``features'') from descriptions and images using deep neural networks. These attributes are then used to estimate the hedonic price function. To demonstrate the effectiveness of this approach, we apply the models to Amazon's data for first-party apparel sales, and estimate hedonic prices. The resulting models have a very high out-of-sample predictive accuracy, with ranging from to . Finally, we construct the AI-based hedonic Fisher price index, chained at the year-over-year frequency, and contrast it with the CPI and other electronic indices.
Paper Structure (43 sections, 43 equations, 17 figures, 4 tables)

This paper contains 43 sections, 43 equations, 17 figures, 4 tables.

Figures (17)

  • Figure 1: An example of product characteristics for a product sold in the Amazon store.
  • Figure 2: Our method for generating hedonic price: The input consists of images and unstructured text data. The first step of the process creates the moderately high-dimensional numerical embeddings $I$ and $W$ for images and text data via state-of-the-art neural networks: ResNet-50 and BERT. The second step takes input $X = (I,W)$ and creates predictions for hedonic prices $H_t(X)$ using a multi-task neural network. Our multi-task model creates an intermediate lower dimensional embedding $V= V(X)$, called a value embedding, and then predicts the final prices in all periods $\{H_t(V), t=1,..., T\}$ using linear functional forms; this makes it easy to perform inference on the last step using hold-out data.
  • Figure 3: Standard architecture of a Deep Neural Network. In the hedonic price prediction network, the penultimate layer is interpreted as an embedding of the product's hedonic value and the output layer contains predicted hedonic prices in all time periods. In comparison, the networks used for text and image processing have very high-dimensional inputs and outputs, with intermediate hidden layers composed of neural sub-networks. The dense embeddings typically result from taking the last hidden layer of the network.
  • Figure 4: Turnover Rate for Products. The Figure shows the share of products with transactions in a given month and no transactions in the previous month (blue line), as well as the share with transactions in a given month and no transactions in the next month (orange line).
  • Figure 5: Products transacted per month relative to the number of products transacted in January, 2013.
  • ...and 12 more figures

Theorems & Definitions (1)

  • Remark 1