Table of Contents
Fetching ...

Sustainable techniques to improve Data Quality for training image-based explanatory models for Recommender Systems

Jorge Paz-Ruza, David Esteban-Martínez, Amparo Alonso-Betanzos, Bertha Guijarro-Berdiñas

TL;DR

This work tackles data quality challenges in visual explainability for recommender systems that rely on user-uploaded images, addressing cold-start and unlabeled negatives while prioritizing sustainability. It introduces three sustainable data-quality techniques: per-user Positive-Unlabelled Learning to identify reliable negatives, transform-based image augmentation, and text-to-image generative augmentation, integrating them into three state-of-the-art explainability models (ELVis, MF-ELVis, BRIE). The experimental results on real-world restaurant datasets show up to about 5% improvements in explanation ranking metrics with reduced training emissions, and a nuanced trade-off between augmentation overhead and long-term sustainability. The authors also provide an open-source library to enable reproducibility and further exploration of sustainable data-quality strategies for explainable RS.

Abstract

Visual explanations based on user-uploaded images are an effective and self-contained approach to provide transparency to Recommender Systems (RS), but intrinsic limitations of data used in this explainability paradigm cause existing approaches to use bad quality training data that is highly sparse and suffers from labelling noise. Popular training enrichment approaches like model enlargement or massive data gathering are expensive and environmentally unsustainable, thus we seek to provide better visual explanations to RS aligning with the principles of Responsible AI. In this work, we research the intersection of effective and sustainable training enrichment strategies for visual-based RS explainability models by developing three novel strategies that focus on training Data Quality: 1) selection of reliable negative training examples using Positive-unlabelled Learning, 2) transform-based data augmentation, and 3) text-to-image generative-based data augmentation. The integration of these strategies in three state-of-the-art explainability models increases 5% the performance in relevant ranking metrics of these visual-based RS explainability models without penalizing their practical long-term sustainability, as tested in multiple real-world restaurant recommendation explanation datasets.

Sustainable techniques to improve Data Quality for training image-based explanatory models for Recommender Systems

TL;DR

This work tackles data quality challenges in visual explainability for recommender systems that rely on user-uploaded images, addressing cold-start and unlabeled negatives while prioritizing sustainability. It introduces three sustainable data-quality techniques: per-user Positive-Unlabelled Learning to identify reliable negatives, transform-based image augmentation, and text-to-image generative augmentation, integrating them into three state-of-the-art explainability models (ELVis, MF-ELVis, BRIE). The experimental results on real-world restaurant datasets show up to about 5% improvements in explanation ranking metrics with reduced training emissions, and a nuanced trade-off between augmentation overhead and long-term sustainability. The authors also provide an open-source library to enable reproducibility and further exploration of sustainable data-quality strategies for explainable RS.

Abstract

Visual explanations based on user-uploaded images are an effective and self-contained approach to provide transparency to Recommender Systems (RS), but intrinsic limitations of data used in this explainability paradigm cause existing approaches to use bad quality training data that is highly sparse and suffers from labelling noise. Popular training enrichment approaches like model enlargement or massive data gathering are expensive and environmentally unsustainable, thus we seek to provide better visual explanations to RS aligning with the principles of Responsible AI. In this work, we research the intersection of effective and sustainable training enrichment strategies for visual-based RS explainability models by developing three novel strategies that focus on training Data Quality: 1) selection of reliable negative training examples using Positive-unlabelled Learning, 2) transform-based data augmentation, and 3) text-to-image generative-based data augmentation. The integration of these strategies in three state-of-the-art explainability models increases 5% the performance in relevant ranking metrics of these visual-based RS explainability models without penalizing their practical long-term sustainability, as tested in multiple real-world restaurant recommendation explanation datasets.
Paper Structure (23 sections, 1 equation, 6 figures, 3 tables)

This paper contains 23 sections, 1 equation, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Topologies and optimization of ELVis, MF-ELVis and BRIE Díez_Pérez-Núñez_Luaces_Remeseiro_Bahamonde_2020Paz-Ruza_Eiras-Franco_Guijarro-Berdiñas_Alonso-Betanzos_2022Paz-Ruza_Alonso-Betanzos_Guijarro-Berdiñas_Cancela_Eiras-Franco_2024.
  • Figure 2: User-personalized PU Learning technique proposed to select reliable negative examples (bad image explanations) for each user in recommendation personalization contexts. Here, the decision boundary for reliable negative selection is shown for two users A and B with different explanatory preferences.
  • Figure 3: Example of transform-based data augmentation based on user's existing images, here to an image uploaded to a restaurant recommendation review.
  • Figure 3: Training carbon emissions and execution time for explainability models ELVis, MF-ELVis and BRIE with and without the proposed Data Quality explainability techniques.
  • Figure 4: Prompt structure (left) and examples of generated training images from reviews (right) through generative augmentation in a restaurant context.
  • ...and 1 more figures