Table of Contents
Fetching ...

Agro-STAY : Collecte de données et analyse des informations en agriculture alternative issues de YouTube

Laura Maxim, Julien Rabatel, Jean-Marc Douguet, Natalia Grabar, Roberto Interdonato, Sébastien Loustau, Mathieu Roche, Maguelonne Teisseire

TL;DR

This work introduces Agro-STAY, a platform for collecting YouTube videos and comments to study alternative agricultural practices through NLP and language models. It combines sociological aims with automated content processing, leveraging six expert channels, transcription restoration, and comment analysis to build a dual-classification system: six information types and controversy. The initial results using CamemBERT fine-tuned on annotated comments show promising but imbalanced performance (e.g., macro F1 of 72.01 for controversy and 40.57 for information types), signaling the need for data balancing and preprocessing improvements. The approach provides a scalable, interdisciplinary toolkit for investigating knowledge flows and social dynamics around autosufficiency on digital platforms, with potential applicability to other domains and languages.

Abstract

To address the current crises (climatic, social, economic), the self-sufficiency -- a set of practices that combine energy sobriety, self-production of food and energy, and self-construction - arouses an increasing interest. The CNRS STAY project (Savoirs Techniques pour l'Auto-suffisance, sur YouTube) explores this topic by analyzing techniques shared on YouTube. We present Agro-STAY, a platform designed for the collection, processing, and visualization of data from YouTube videos and their comments. We use Natural Language Processing (NLP) techniques and language models, which enable a fine-grained analysis of alternative agricultural practice described online. -- Face aux crises actuelles (climatiques, sociales, économiques), l'auto-suffisance -- ensemble de pratiques combinant sobriété énergétique, autoproduction alimentaire et énergétique et autoconstruction - suscite un intérêt croissant. Le projet CNRS STAY (Savoirs Techniques pour l'Auto-suffisance, sur YouTube) s'inscrit dans ce domaine en analysant les savoirs techniques diffusés sur YouTube. Nous présentons Agro-STAY, une plateforme dédiée à la collecte, au traitement et à la visualisation de données issues de vidéos YouTube et de leurs commentaires. En mobilisant des techniques de traitement automatique des langues (TAL) et des modèles de langues, ce travail permet une analyse fine des pratiques agricoles alternatives décrites en ligne.

Agro-STAY : Collecte de données et analyse des informations en agriculture alternative issues de YouTube

TL;DR

This work introduces Agro-STAY, a platform for collecting YouTube videos and comments to study alternative agricultural practices through NLP and language models. It combines sociological aims with automated content processing, leveraging six expert channels, transcription restoration, and comment analysis to build a dual-classification system: six information types and controversy. The initial results using CamemBERT fine-tuned on annotated comments show promising but imbalanced performance (e.g., macro F1 of 72.01 for controversy and 40.57 for information types), signaling the need for data balancing and preprocessing improvements. The approach provides a scalable, interdisciplinary toolkit for investigating knowledge flows and social dynamics around autosufficiency on digital platforms, with potential applicability to other domains and languages.

Abstract

To address the current crises (climatic, social, economic), the self-sufficiency -- a set of practices that combine energy sobriety, self-production of food and energy, and self-construction - arouses an increasing interest. The CNRS STAY project (Savoirs Techniques pour l'Auto-suffisance, sur YouTube) explores this topic by analyzing techniques shared on YouTube. We present Agro-STAY, a platform designed for the collection, processing, and visualization of data from YouTube videos and their comments. We use Natural Language Processing (NLP) techniques and language models, which enable a fine-grained analysis of alternative agricultural practice described online. -- Face aux crises actuelles (climatiques, sociales, économiques), l'auto-suffisance -- ensemble de pratiques combinant sobriété énergétique, autoproduction alimentaire et énergétique et autoconstruction - suscite un intérêt croissant. Le projet CNRS STAY (Savoirs Techniques pour l'Auto-suffisance, sur YouTube) s'inscrit dans ce domaine en analysant les savoirs techniques diffusés sur YouTube. Nous présentons Agro-STAY, une plateforme dédiée à la collecte, au traitement et à la visualisation de données issues de vidéos YouTube et de leurs commentaires. En mobilisant des techniques de traitement automatique des langues (TAL) et des modèles de langues, ce travail permet une analyse fine des pratiques agricoles alternatives décrites en ligne.

Paper Structure

This paper contains 11 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Captures d'écran d'Agro-STAY.
  • Figure 2: Captures d'écran d'Agro-STAY.
  • Figure 3: Distribution des classes dans les données annotées.