Multi-Aspect Reviewed-Item Retrieval via LLM Query Decomposition and Aspect Fusion
Anton Korikov, George Saad, Ethan Baron, Mustafa Khan, Manav Shah, Scott Sanner
TL;DR
This paper tackles the challenge of retrieving items from noisy, multi-aspect reviews by extending reviewed-item retrieval (RIR) to a multi-aspect setting (MA-RIR). It identifies fundamental failures of late-fusion (LF) when aspect distributions across reviews are imbalanced and proposes Aspect Fusion (AF), which uses LLM-driven query aspect extraction, per-aspect scoring, and versatile fusion strategies, with optional LLM-based reranking. Experiments on Recipe-MPR-derived data show AF substantially improves MAP@10 on imbalanced distributions (e.g., from $0.36\pm0.04$ to $0.52\pm0.04$) while maintaining parity with LF on balanced data; LLM reranking offers additional gains when ample context is provided. The work advances robust MA-RIR by isolating aspect-level information during retrieval and demonstrates practical potential for improving multi-aspect product search and recommendations.
Abstract
While user-generated product reviews often contain large quantities of information, their utility in addressing natural language product queries has been limited, with a key challenge being the need to aggregate information from multiple low-level sources (reviews) to a higher item level during retrieval. Existing methods for reviewed-item retrieval (RIR) typically take a late fusion (LF) approach which computes query-item scores by simply averaging the top-K query-review similarity scores for an item. However, we demonstrate that for multi-aspect queries and multi-aspect items, LF is highly sensitive to the distribution of aspects covered by reviews in terms of aspect frequency and the degree of aspect separation across reviews. To address these LF failures, we propose several novel aspect fusion (AF) strategies which include Large Language Model (LLM) query extraction and generative reranking. Our experiments show that for imbalanced review corpora, AF can improve over LF by a MAP@10 increase from 0.36 to 0.52, while achieving equivalent performance for balanced review corpora.
