Table of Contents
Fetching ...

From What to Why: Thought-Space Recommendation with Small Language Models

Prosenjit Biswas, Pervez Shaik, Abhinav Thorat, Ravi Kolla, Niranjan Pedanekar

TL;DR

PULSE introduces Thought Space, a framework that substitutes costly LLM reasoning with a small-language-model–driven, rationale-grounded representation for recommendation. By generating high-quality rationales with an SLM, aligning them to user history via contrastive learning, and refining the selected rationales through Tree-of-Thought followed by PEFT-based fine-tuning, PULSE achieves strong sequential recommendation performance and robust cross-domain transfer. The approach also demonstrates transferable reasoning improvements to multi-hop QA tasks like HotpotQA, indicating broader applicability of rationale-driven, contrastively learned representations. Overall, PULSE shows that compact models, when guided by structured, behavior-aligned reasoning signals, can outperform billion-parameter LLMs in both accuracy and efficiency, with practical implications for scalable, interpretable recommender systems.

Abstract

Large Language Models (LLMs) have advanced recommendation capabilities through enhanced reasoning, but pose significant challenges for real-world deployment due to high inference costs. Conversely, while Small Language Models (SLMs) offer an efficient alternative, their reasoning capabilities for recommendation remain underexplored. Existing systems often use natural language rationales merely as unsupervised descriptive text, failing to harness their full potential as learning signals. In this work our main idea is to create a common understanding of user and items across multiple domains called Thought Space with SLMs instead of using LLMs' distilled knowledge. To that end we propose PULSE (Preference Understanding by Latent Semantic Embeddings), a framework that treats SLM-generated rationales as director learning signals, supervising them with interaction histories to jointly model user actions (what) and their semantic drivers (why). Existing methods consider only interactions such as sequences and embeddings, whereas PULSE treats rationales as first-class signals, this novel design yields embeddings that are more robust and generalizable. Extensive experiments demonstrate that PULSE outperforms leading ID, Collaborative Filtering (CF), and LLM-based sequential recommendation models across multiple benchmark datasets. Furthermore, PULSE exhibits superior transferability in cross-domain recommendation and demonstrates strong performance on downstream tasks such as reasoning-oriented question answering. Our code is available \href{https://anonymous.4open.science/r/Thinking_PULSE-0FC5/README.md}{here}.

From What to Why: Thought-Space Recommendation with Small Language Models

TL;DR

PULSE introduces Thought Space, a framework that substitutes costly LLM reasoning with a small-language-model–driven, rationale-grounded representation for recommendation. By generating high-quality rationales with an SLM, aligning them to user history via contrastive learning, and refining the selected rationales through Tree-of-Thought followed by PEFT-based fine-tuning, PULSE achieves strong sequential recommendation performance and robust cross-domain transfer. The approach also demonstrates transferable reasoning improvements to multi-hop QA tasks like HotpotQA, indicating broader applicability of rationale-driven, contrastively learned representations. Overall, PULSE shows that compact models, when guided by structured, behavior-aligned reasoning signals, can outperform billion-parameter LLMs in both accuracy and efficiency, with practical implications for scalable, interpretable recommender systems.

Abstract

Large Language Models (LLMs) have advanced recommendation capabilities through enhanced reasoning, but pose significant challenges for real-world deployment due to high inference costs. Conversely, while Small Language Models (SLMs) offer an efficient alternative, their reasoning capabilities for recommendation remain underexplored. Existing systems often use natural language rationales merely as unsupervised descriptive text, failing to harness their full potential as learning signals. In this work our main idea is to create a common understanding of user and items across multiple domains called Thought Space with SLMs instead of using LLMs' distilled knowledge. To that end we propose PULSE (Preference Understanding by Latent Semantic Embeddings), a framework that treats SLM-generated rationales as director learning signals, supervising them with interaction histories to jointly model user actions (what) and their semantic drivers (why). Existing methods consider only interactions such as sequences and embeddings, whereas PULSE treats rationales as first-class signals, this novel design yields embeddings that are more robust and generalizable. Extensive experiments demonstrate that PULSE outperforms leading ID, Collaborative Filtering (CF), and LLM-based sequential recommendation models across multiple benchmark datasets. Furthermore, PULSE exhibits superior transferability in cross-domain recommendation and demonstrates strong performance on downstream tasks such as reasoning-oriented question answering. Our code is available \href{https://anonymous.4open.science/r/Thinking_PULSE-0FC5/README.md}{here}.

Paper Structure

This paper contains 15 sections, 3 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: The architecture diagram. Components marked with are trainable, whereas those marked with are frozen and only used during inference.
  • Figure 2: Two-stage prompting framework. In Phase 1, the LLM () generates a rationale from the user’s history and choice. In Phase 2, reasoning is refined by human-like oversight () combined with Tree-of-Thought exploration () to derive the best reason. [Phase 1: LLM rationale generation; Phase 2: LLM + ToT reasoning]
  • Figure 3: Thought Space before vs. after contrastive alignment. (a) Pre-training: behavioral embeddings (green) and rationale embeddings with positives (blue) and negatives (red)are misaligned, with DistilBERT-initialized behavioral and rationale texts occupying different regions. (b) Post-training: after optimizing Eq. (1), positives cluster near their behavioral anchors (blue edges shorten), while negatives are repelled (red edges lengthen).