(De)-Indexing and the Right to be Forgotten

Salvatore Vilella; Giancarlo Ruffo

(De)-Indexing and the Right to be Forgotten

Salvatore Vilella, Giancarlo Ruffo

TL;DR

Various IR models are explored, including boolean, probabilistic, vector space, and embedding-based approaches, as well as the role of Large Language Models (LLMs) in enhancing data processing capabilities.

Abstract

In the digital age, the challenge of forgetfulness has emerged as a significant concern, particularly regarding the management of personal data and its accessibility online. The right to be forgotten (RTBF) allows individuals to request the removal of outdated or harmful information from public access, yet implementing this right poses substantial technical difficulties for search engines. This paper aims to introduce non-experts to the foundational concepts of information retrieval (IR) and de-indexing, which are critical for understanding how search engines can effectively "forget" certain content. We will explore various IR models, including boolean, probabilistic, vector space, and embedding-based approaches, as well as the role of Large Language Models (LLMs) in enhancing data processing capabilities. By providing this overview, we seek to highlight the complexities involved in balancing individual privacy rights with the operational challenges faced by search engines in managing information visibility.

(De)-Indexing and the Right to be Forgotten

TL;DR

Abstract

Paper Structure (16 sections, 16 equations, 2 figures)

This paper contains 16 sections, 16 equations, 2 figures.

Introduction
Models of Information Retrieval
Boolean Models and Document Representations
Limitations of Boolean Query Models
Vector Space Models
Probabilistic Models
The Probabilistic Relevance Model
BM25 and Term Weighting
Document and Word Embeddings
Word Embeddings
Document Embeddings
Large Language Models
Training Phase
Fine-Tuning Phase
Early and Modern LLMs
...and 1 more sections

Figures (2)

Figure 1: Left: interest in time of the queries machine learning and llm on Google. We can see how LLMs gain momentum and approach the level of the machine learning query over time. Right: the evolutionary tree of LLMs yang2023harnessing.
Figure 2: An intuitive comparison between the steps required to train a special-service dog and the training and fine-tuning phases of a LLM google_intro_llms.

(De)-Indexing and the Right to be Forgotten

TL;DR

Abstract

(De)-Indexing and the Right to be Forgotten

Authors

TL;DR

Abstract

Table of Contents

Figures (2)