Table of Contents
Fetching ...

Controlled Territory and Conflict Tracking (CONTACT): (Geo-)Mapping Occupied Territory from Open Source Intelligence

Paul K. Mandal, Cole Leo, Connor Hurley

TL;DR

CONTACT tackles the problem of real-time territorial control inference from noisy OSINT by comparing SetFit and prompt-tuned BLOOMZ on a small, VIINA-labeled ISIS-news dataset. The approach demonstrates that a prompt-tuned, decoder-style model with label definitions embedded in the prompt can outperform traditional few-shot embeddings in low-resource settings. Key contributions include a lightweight, open-source data pipeline, a hand-labeled VIINA-style dataset, and evidence that prompt-based supervision reduces annotation burden while enabling structured multi-label inference. The work suggests practical impact for conflict monitoring workflows, provided larger-scale validation and robust generalization are pursued.

Abstract

Open-source intelligence provides a stream of unstructured textual data that can inform assessments of territorial control. We present CONTACT, a framework for territorial control prediction using large language models (LLMs) and minimal supervision. We evaluate two approaches: SetFit, an embedding-based few-shot classifier, and a prompt tuning method applied to BLOOMZ-560m, a multilingual generative LLM. Our model is trained on a small hand-labeled dataset of news articles covering ISIS activity in Syria and Iraq, using prompt-conditioned extraction of control-relevant signals such as military operations, casualties, and location references. We show that the BLOOMZ-based model outperforms the SetFit baseline, and that prompt-based supervision improves generalization in low-resource settings. CONTACT demonstrates that LLMs fine-tuned using few-shot methods can reduce annotation burdens and support structured inference from open-ended OSINT streams. Our code is available at https://github.com/PaulKMandal/CONTACT/.

Controlled Territory and Conflict Tracking (CONTACT): (Geo-)Mapping Occupied Territory from Open Source Intelligence

TL;DR

CONTACT tackles the problem of real-time territorial control inference from noisy OSINT by comparing SetFit and prompt-tuned BLOOMZ on a small, VIINA-labeled ISIS-news dataset. The approach demonstrates that a prompt-tuned, decoder-style model with label definitions embedded in the prompt can outperform traditional few-shot embeddings in low-resource settings. Key contributions include a lightweight, open-source data pipeline, a hand-labeled VIINA-style dataset, and evidence that prompt-based supervision reduces annotation burden while enabling structured multi-label inference. The work suggests practical impact for conflict monitoring workflows, provided larger-scale validation and robust generalization are pursued.

Abstract

Open-source intelligence provides a stream of unstructured textual data that can inform assessments of territorial control. We present CONTACT, a framework for territorial control prediction using large language models (LLMs) and minimal supervision. We evaluate two approaches: SetFit, an embedding-based few-shot classifier, and a prompt tuning method applied to BLOOMZ-560m, a multilingual generative LLM. Our model is trained on a small hand-labeled dataset of news articles covering ISIS activity in Syria and Iraq, using prompt-conditioned extraction of control-relevant signals such as military operations, casualties, and location references. We show that the BLOOMZ-based model outperforms the SetFit baseline, and that prompt-based supervision improves generalization in low-resource settings. CONTACT demonstrates that LLMs fine-tuned using few-shot methods can reduce annotation burdens and support structured inference from open-ended OSINT streams. Our code is available at https://github.com/PaulKMandal/CONTACT/.

Paper Structure

This paper contains 9 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: Pipeline for the CONTACT framework: articles are scraped from archived news sources, preprocessed, and labeled with territorial control indicators. The resulting dataset is used to fine-tune two models: a SetFit classifier based on sentence embeddings, and a BLOOMZ model using prompt tuning. Fine-tuned models perform multi-label inference, with future extensions including location extraction and dashboard integration.