Table of Contents
Fetching ...

Towards Personalized Bangla Book Recommendation: A Large-Scale Multi-Entity Book Graph Dataset

Rahin Arefin Ahmed, Md. Anik Chowdhury, Sakil Ahmed Sheikh Reza, Devnil Bhattacharjee, Muhammad Abdullah Adnan, Nafis Sadeq

TL;DR

This work addresses the lack of large-scale Bangla recommendation data by introducing RokomariBG, a multi-entity heterogeneous book graph with eight relation types and 23 side features, enabling joint modeling of user interactions, relational structure, and textual content. A comprehensive benchmark evaluates a broad spectrum of models, with the Neural Two-Tower Retrieval with Side Features achieving the best performance ($NDCG@10=0.204$, $NDCG@50=0.276$), underscoring the value of combining side information and relational knowledge in low-resource settings. The dataset, extraction pipeline, and benchmarks together provide a public resource for reproducible evaluation and future research in Bangla e-commerce recommendation, including cold-start, cross-lingual, and explainable approaches.

Abstract

Personalized book recommendation in Bangla literature has been constrained by the lack of structured, large-scale, and publicly available datasets. This work introduces RokomariBG, a large-scale, multi-entity heterogeneous book graph dataset designed to support research on personalized recommendation in a low-resource language setting. The dataset comprises 127,302 books, 63,723 users, 16,601 authors, 1,515 categories, 2,757 publishers, and 209,602 reviews, connected through eight relation types and organized as a comprehensive knowledge graph. To demonstrate the utility of the dataset, we provide a systematic benchmarking study on the Top-N recommendation task, evaluating a diverse set of representative recommendation models, including classical collaborative filtering methods, matrix factorization models, content-based approaches, graph neural networks, a hybrid matrix factorization model with side information, and a neural two-tower retrieval architecture. The benchmarking results highlight the importance of leveraging multi-relational structure and textual side information, with neural retrieval models achieving the strongest performance (NDCG@10 = 0.204). Overall, this work establishes a foundational benchmark and a publicly available resource for Bangla book recommendation research, enabling reproducible evaluation and future studies on recommendation in low-resource cultural domains. The dataset and code are publicly available at https://github.com/backlashblitz/Bangla-Book-Recommendation-Dataset

Towards Personalized Bangla Book Recommendation: A Large-Scale Multi-Entity Book Graph Dataset

TL;DR

This work addresses the lack of large-scale Bangla recommendation data by introducing RokomariBG, a multi-entity heterogeneous book graph with eight relation types and 23 side features, enabling joint modeling of user interactions, relational structure, and textual content. A comprehensive benchmark evaluates a broad spectrum of models, with the Neural Two-Tower Retrieval with Side Features achieving the best performance (, ), underscoring the value of combining side information and relational knowledge in low-resource settings. The dataset, extraction pipeline, and benchmarks together provide a public resource for reproducible evaluation and future research in Bangla e-commerce recommendation, including cold-start, cross-lingual, and explainable approaches.

Abstract

Personalized book recommendation in Bangla literature has been constrained by the lack of structured, large-scale, and publicly available datasets. This work introduces RokomariBG, a large-scale, multi-entity heterogeneous book graph dataset designed to support research on personalized recommendation in a low-resource language setting. The dataset comprises 127,302 books, 63,723 users, 16,601 authors, 1,515 categories, 2,757 publishers, and 209,602 reviews, connected through eight relation types and organized as a comprehensive knowledge graph. To demonstrate the utility of the dataset, we provide a systematic benchmarking study on the Top-N recommendation task, evaluating a diverse set of representative recommendation models, including classical collaborative filtering methods, matrix factorization models, content-based approaches, graph neural networks, a hybrid matrix factorization model with side information, and a neural two-tower retrieval architecture. The benchmarking results highlight the importance of leveraging multi-relational structure and textual side information, with neural retrieval models achieving the strongest performance (NDCG@10 = 0.204). Overall, this work establishes a foundational benchmark and a publicly available resource for Bangla book recommendation research, enabling reproducible evaluation and future studies on recommendation in low-resource cultural domains. The dataset and code are publicly available at https://github.com/backlashblitz/Bangla-Book-Recommendation-Dataset
Paper Structure (58 sections, 6 equations, 23 tables)