Table of Contents
Fetching ...

Where It Really Matters: Few-Shot Environmental Conservation Media Monitoring for Low-Resource Languages

Sameer Jain, Sedrick Scott Keh, Shova Chettri, Karun Dewan, Pablo Izquierdo, Johanna Prussman, Pooja Shreshtha, Cesar Suarez, Zheyuan Ryan Shi, Lei Li, Fei Fang

TL;DR

This work addresses the scarcity of labeled data for environmental conservation news monitoring in low-resource languages by introducing NewsSerow, a multilingual few-shot framework that combines zero-shot summarization, in-context demonstrations, and a self-reflection module. Built atop LLMs, NewsSerow uses about 10 in-context examples to classify conservation-relevant content, achieving performance competitive with fully fine-tuned models while using significantly less labeled data. Empirical results on Nepali and Colombian Spanish demonstrate strong improvements over zero-shot and few-shot baselines, with deployment by WWF in Nepal and Colombia validating real-world applicability and impact. The approach reduces the operational burden of monitoring, is scalable to additional languages, and provides interpretable reasoning through explanations and reflections that aid NGOs in decision-making.

Abstract

Environmental conservation organizations routinely monitor news content on conservation in protected areas to maintain situational awareness of developments that can have an environmental impact. Existing automated media monitoring systems require large amounts of data labeled by domain experts, which is only feasible at scale for high-resource languages like English. However, such tools are most needed in the global south where news of interest is mainly in local low-resource languages, and far fewer experts are available to annotate datasets sustainably. In this paper, we propose NewsSerow, a method to automatically recognize environmental conservation content in low-resource languages. NewsSerow is a pipeline of summarization, in-context few-shot classification, and self-reflection using large language models (LLMs). Using at most 10 demonstration example news articles in Nepali, NewsSerow significantly outperforms other few-shot methods and achieves comparable performance with models fully fine-tuned using thousands of examples. The World Wide Fund for Nature (WWF) has deployed NewsSerow for media monitoring in Nepal, significantly reducing their operational burden, and ensuring that AI tools for conservation actually reach the communities that need them the most. NewsSerow has also been deployed for countries with other languages like Colombia.

Where It Really Matters: Few-Shot Environmental Conservation Media Monitoring for Low-Resource Languages

TL;DR

This work addresses the scarcity of labeled data for environmental conservation news monitoring in low-resource languages by introducing NewsSerow, a multilingual few-shot framework that combines zero-shot summarization, in-context demonstrations, and a self-reflection module. Built atop LLMs, NewsSerow uses about 10 in-context examples to classify conservation-relevant content, achieving performance competitive with fully fine-tuned models while using significantly less labeled data. Empirical results on Nepali and Colombian Spanish demonstrate strong improvements over zero-shot and few-shot baselines, with deployment by WWF in Nepal and Colombia validating real-world applicability and impact. The approach reduces the operational burden of monitoring, is scalable to additional languages, and provides interpretable reasoning through explanations and reflections that aid NGOs in decision-making.

Abstract

Environmental conservation organizations routinely monitor news content on conservation in protected areas to maintain situational awareness of developments that can have an environmental impact. Existing automated media monitoring systems require large amounts of data labeled by domain experts, which is only feasible at scale for high-resource languages like English. However, such tools are most needed in the global south where news of interest is mainly in local low-resource languages, and far fewer experts are available to annotate datasets sustainably. In this paper, we propose NewsSerow, a method to automatically recognize environmental conservation content in low-resource languages. NewsSerow is a pipeline of summarization, in-context few-shot classification, and self-reflection using large language models (LLMs). Using at most 10 demonstration example news articles in Nepali, NewsSerow significantly outperforms other few-shot methods and achieves comparable performance with models fully fine-tuned using thousands of examples. The World Wide Fund for Nature (WWF) has deployed NewsSerow for media monitoring in Nepal, significantly reducing their operational burden, and ensuring that AI tools for conservation actually reach the communities that need them the most. NewsSerow has also been deployed for countries with other languages like Colombia.
Paper Structure (29 sections, 3 figures, 4 tables)

This paper contains 29 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: NewsSerow prompt pipeline. We illustrate the flow of the test example in red. Model responses that are used as input for later prompts are color-coded by background. For example, the test article summary (highlighted yellow) is generated by the summarization module and used in the classification and reflection modules.
  • Figure 2: Fine-tuned XLM-R's performance with number of training examples on the (a) Nepali (purple) and (b) Spanish (green) test sets. X-axis shows the number of target language (Nepali/Spanish) training examples, in addition to which 1647 English examples are used to fine-tune the models. We show NewsSerow's 10-shot performance on the same test sets in red with a light red error bar.
  • Figure 3: NewsSerow's performance with number of in-context examples