Multi-Aspect Reviewed-Item Retrieval via LLM Query Decomposition and Aspect Fusion

Anton Korikov; George Saad; Ethan Baron; Mustafa Khan; Manav Shah; Scott Sanner

Multi-Aspect Reviewed-Item Retrieval via LLM Query Decomposition and Aspect Fusion

Anton Korikov, George Saad, Ethan Baron, Mustafa Khan, Manav Shah, Scott Sanner

TL;DR

This paper tackles the challenge of retrieving items from noisy, multi-aspect reviews by extending reviewed-item retrieval (RIR) to a multi-aspect setting (MA-RIR). It identifies fundamental failures of late-fusion (LF) when aspect distributions across reviews are imbalanced and proposes Aspect Fusion (AF), which uses LLM-driven query aspect extraction, per-aspect scoring, and versatile fusion strategies, with optional LLM-based reranking. Experiments on Recipe-MPR-derived data show AF substantially improves MAP@10 on imbalanced distributions (e.g., from $0.36\pm0.04$ to $0.52\pm0.04$) while maintaining parity with LF on balanced data; LLM reranking offers additional gains when ample context is provided. The work advances robust MA-RIR by isolating aspect-level information during retrieval and demonstrates practical potential for improving multi-aspect product search and recommendations.

Abstract

While user-generated product reviews often contain large quantities of information, their utility in addressing natural language product queries has been limited, with a key challenge being the need to aggregate information from multiple low-level sources (reviews) to a higher item level during retrieval. Existing methods for reviewed-item retrieval (RIR) typically take a late fusion (LF) approach which computes query-item scores by simply averaging the top-K query-review similarity scores for an item. However, we demonstrate that for multi-aspect queries and multi-aspect items, LF is highly sensitive to the distribution of aspects covered by reviews in terms of aspect frequency and the degree of aspect separation across reviews. To address these LF failures, we propose several novel aspect fusion (AF) strategies which include Large Language Model (LLM) query extraction and generative reranking. Our experiments show that for imbalanced review corpora, AF can improve over LF by a MAP@10 increase from 0.36 to 0.52, while achieving equivalent performance for balanced review corpora.

Multi-Aspect Reviewed-Item Retrieval via LLM Query Decomposition and Aspect Fusion

TL;DR

) while maintaining parity with LF on balanced data; LLM reranking offers additional gains when ample context is provided. The work advances robust MA-RIR by isolating aspect-level information during retrieval and demonstrates practical potential for improving multi-aspect product search and recommendations.

Abstract

Paper Structure (36 sections, 3 equations, 11 figures, 10 tables)

This paper contains 36 sections, 3 equations, 11 figures, 10 tables.

Introduction
Background
Neural IR
Reviewed-Item Retrieval
Problem Formulation
Fusion
Multi-Aspect Reviewed Item Retrieval
Multi-Aspect Queries
Multi-Aspect Reviewed-Items
Multi-Aspect Review Distributions
Fully Overlapping Distributions
Degree of Separation and Aspect Frequency
Aspect Fusion for MA-RIR
Desiderata of Aspect Fusion
Desideratum 1:
...and 21 more sections

Figures (11)

Figure 1: Two extremes of item aspect distributions, showing reviews for an item with aspects "meatballs" and "ready in 25 minutes": a) Fully overlapping (top) --- Each review mentions all item aspects. b) Fully disjoint with imbalanced aspect frequency (bottom) --- no review mentions more than one aspect, and some aspects are mentioned much more frequently than others.
Figure 2: a) Top. In (Monolithic) LF, the full query is scored against all reviews, and the top $K_R$ query-review scores are averaged for each item to produce a query-item score. b) Bottom. Aspect Fusion extracts aspects (i.e., query subspans) from a query, performs LF with each aspect, and aggregates the resulting top $K_I$ item lists (i.e., one list per extracted aspect) to a final list.
Figure 3: Monolithic LF versus Aspect Fusion with AMean aggregation. Both methods perform similarly on the fully overlapping dataset, but Aspect Fusion performs significantly better than Monolithic LF for the fully disjoint dataset and $K_R < 30$. For the fully disjoint dataset, Aspect Fusion drops in performance for $K_R > 10$ because when $K_R$ exceeds the number of reviews per aspect, scoring is based on reviews that are irrelevant to the given aspect. This decline in performance does not apply in the fully overlapping case.
Figure 4: Effect of Aspect Frequency. Aspect Fusion performs better than Monolithic LF for low values of $K_R$, but suffers for higher values of $K_R$. This pattern is explained in the discussion of RQ1.
Figure 5: Aspect Fusion with GT vs extracted query aspects with fully disjoint reviews. Although GT query aspects perform better, Aspect Fusion still offers an improvement over Monolithic LF with extracted query aspects.
...and 6 more figures

Multi-Aspect Reviewed-Item Retrieval via LLM Query Decomposition and Aspect Fusion

TL;DR

Abstract

Multi-Aspect Reviewed-Item Retrieval via LLM Query Decomposition and Aspect Fusion

Authors

TL;DR

Abstract

Table of Contents

Figures (11)