Table of Contents
Fetching ...

How public datasets constrain the development of diversity-aware news recommender systems, and what law could do about it

Max van Drunen, Sanne Vrijenhoek

TL;DR

This paper investigates how public data resources constrain the realization of diversity-aware news recommender systems and argues that European policy can provide structural access to the data needed for such systems. It analyzes four widely used public datasets (MIND, Adressa, EB-NeRD, Globo) to assess their suitability for diversity objectives, finding that current data mostly support short-term engagement optimization and lack rich metadata on topics, viewpoints, formats, and audiences. The authors outline the data necessary to enable normative diversity in recommendations and highlight the gaps in existing datasets, warning that a single dataset cannot solve these limitations due to regional and temporal variation. They advocate a proactive European policy path, including public funding for high-quality, diverse datasets and greater involvement of public service media, to safeguard editorial values and media pluralism while reducing dependence on large tech platforms. In the short term, combining multiple regional datasets could broaden research scope, while in the long term a concerted policy and institutional effort is required to establish foundational datasets and data-sharing infrastructures that align with democratic principles.

Abstract

News recommender systems increasingly determine what news individuals see online. Over the past decade, researchers have extensively critiqued recommender systems that prioritise news based on user engagement. To offer an alternative, researchers have analysed how recommender systems could support the media's ability to fulfil its role in democratic society by recommending news based on editorial values, particularly diversity. However, there continues to be a large gap between normative theory on how news recommender systems should incorporate diversity, and technical literature that designs such systems. We argue that to realise diversity-aware recommender systems in practice, it is crucial to pay attention to the datasets that are needed to train modern news recommenders. We aim to make two main contributions. First, we identify the information a dataset must include to enable the development of the diversity-aware news recommender systems proposed in normative literature. Based on this analysis, we assess the limitations of currently available public datasets, and show what potential they do have to expand research into diversity-aware recommender systems. Second, we analyse why and how European law and policy can be used to provide researchers with structural access to the data they need to develop diversity-aware news recommender systems.

How public datasets constrain the development of diversity-aware news recommender systems, and what law could do about it

TL;DR

This paper investigates how public data resources constrain the realization of diversity-aware news recommender systems and argues that European policy can provide structural access to the data needed for such systems. It analyzes four widely used public datasets (MIND, Adressa, EB-NeRD, Globo) to assess their suitability for diversity objectives, finding that current data mostly support short-term engagement optimization and lack rich metadata on topics, viewpoints, formats, and audiences. The authors outline the data necessary to enable normative diversity in recommendations and highlight the gaps in existing datasets, warning that a single dataset cannot solve these limitations due to regional and temporal variation. They advocate a proactive European policy path, including public funding for high-quality, diverse datasets and greater involvement of public service media, to safeguard editorial values and media pluralism while reducing dependence on large tech platforms. In the short term, combining multiple regional datasets could broaden research scope, while in the long term a concerted policy and institutional effort is required to establish foundational datasets and data-sharing infrastructures that align with democratic principles.

Abstract

News recommender systems increasingly determine what news individuals see online. Over the past decade, researchers have extensively critiqued recommender systems that prioritise news based on user engagement. To offer an alternative, researchers have analysed how recommender systems could support the media's ability to fulfil its role in democratic society by recommending news based on editorial values, particularly diversity. However, there continues to be a large gap between normative theory on how news recommender systems should incorporate diversity, and technical literature that designs such systems. We argue that to realise diversity-aware recommender systems in practice, it is crucial to pay attention to the datasets that are needed to train modern news recommenders. We aim to make two main contributions. First, we identify the information a dataset must include to enable the development of the diversity-aware news recommender systems proposed in normative literature. Based on this analysis, we assess the limitations of currently available public datasets, and show what potential they do have to expand research into diversity-aware recommender systems. Second, we analyse why and how European law and policy can be used to provide researchers with structural access to the data they need to develop diversity-aware news recommender systems.

Paper Structure

This paper contains 21 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Counts of the datasets used in papers on developing news recommender systems between January 1st, 2022 and December 31, 2024. Excludes datasets cited 5 times or fewer.