Table of Contents
Fetching ...

Legommenders: A Comprehensive Content-Based Recommendation Library with LLM Support

Qijiong Liu, Lu Fan, Xiao-Ming Wu

TL;DR

Legommenders addresses cold-start and data efficiency challenges in content-based recommendation by enabling end-to-end training of content encoders, behavior fusion, and click prediction, with explicit support for LLM-based encoding and data augmentation. It provides a modular four-component architecture with 15 content operators, 8 behavior operators, and 9 predictors, plus LLMs as content encoders and data generators, enabling over 1,000 model configurations across 15 datasets. A caching inference pipeline delivers up to 50x speedups, improving evaluation efficiency. Experiments on MIND with GPT-augmented data demonstrate that integrating LLMs and end-to-end training yields substantial performance gains, validating Legommenders as a scalable platform for content-aware recommendation research.

Abstract

We present Legommenders, a unique library designed for content-based recommendation that enables the joint training of content encoders alongside behavior and interaction modules, thereby facilitating the seamless integration of content understanding directly into the recommendation pipeline. Legommenders allows researchers to effortlessly create and analyze over 1,000 distinct models across 15 diverse datasets. Further, it supports the incorporation of contemporary large language models, both as feature encoder and data generator, offering a robust platform for developing state-of-the-art recommendation models and enabling more personalized and effective content delivery.

Legommenders: A Comprehensive Content-Based Recommendation Library with LLM Support

TL;DR

Legommenders addresses cold-start and data efficiency challenges in content-based recommendation by enabling end-to-end training of content encoders, behavior fusion, and click prediction, with explicit support for LLM-based encoding and data augmentation. It provides a modular four-component architecture with 15 content operators, 8 behavior operators, and 9 predictors, plus LLMs as content encoders and data generators, enabling over 1,000 model configurations across 15 datasets. A caching inference pipeline delivers up to 50x speedups, improving evaluation efficiency. Experiments on MIND with GPT-augmented data demonstrate that integrating LLMs and end-to-end training yields substantial performance gains, validating Legommenders as a scalable platform for content-aware recommendation research.

Abstract

We present Legommenders, a unique library designed for content-based recommendation that enables the joint training of content encoders alongside behavior and interaction modules, thereby facilitating the seamless integration of content understanding directly into the recommendation pipeline. Legommenders allows researchers to effortlessly create and analyze over 1,000 distinct models across 15 diverse datasets. Further, it supports the incorporation of contemporary large language models, both as feature encoder and data generator, offering a robust platform for developing state-of-the-art recommendation models and enabling more personalized and effective content delivery.

Paper Structure

This paper contains 10 sections, 4 equations, 3 figures, 2 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overview of the Legommenders package.
  • Figure 2: A quick use of Legommenders.
  • Figure 3: Examples for model and dataset configurations.