Integrating Domain Knowledge into Large Language Models for Enhanced Fashion Recommendations
Zhan Shi, Shanglin Yang
TL;DR
The work addresses the challenge of providing personalized fashion recommendations that remain robust under distribution shifts by integrating domain knowledge into a large language model. It introduces the Fashion Large Language Model (FLLM) trained with auto-prompt generation and enhanced by Retrieval-Augmented Generation (RAG) to tailor suggestions to individual users. Key contributions include a domain-aware fine-tuning pipeline using FIIB and binary/FITB data, template QA and LLM auto QA prompts, and a multi-path retrieval mechanism that conditions recommendations on user context and fashion knowledge. Results demonstrate improved accuracy, strong few-shot learning, and better adaptability compared with baselines, underscoring the practical potential of combining LLMs with domain retrieval for personalized, explainable fashion guidance.
Abstract
Fashion, deeply rooted in sociocultural dynamics, evolves as individuals emulate styles popularized by influencers and iconic figures. In the quest to replicate such refined tastes using artificial intelligence, traditional fashion ensemble methods have primarily used supervised learning to imitate the decisions of style icons, which falter when faced with distribution shifts, leading to style replication discrepancies triggered by slight variations in input. Meanwhile, large language models (LLMs) have become prominent across various sectors, recognized for their user-friendly interfaces, strong conversational skills, and advanced reasoning capabilities. To address these challenges, we introduce the Fashion Large Language Model (FLLM), which employs auto-prompt generation training strategies to enhance its capacity for delivering personalized fashion advice while retaining essential domain knowledge. Additionally, by integrating a retrieval augmentation technique during inference, the model can better adjust to individual preferences. Our results show that this approach surpasses existing models in accuracy, interpretability, and few-shot learning capabilities.
