Open-Source Assessments of AI Capabilities: The Proliferation of AI Analysis Tools, Replicating Competitor Models, and the Zhousidun Dataset
Ritwik Gupta, Leah Walker, Eli Glickman, Raine Koizumi, Sarthak Bhatnagar, Andrew W. Reddie
TL;DR
This work presents an open-source methodology to assess military AI capabilities using the Zhousidun dataset, a publicly released Chinese-origin image collection of US and allied destroyers with annotated components. By replicating a near-state-of-the-art detector (YOLOv8l) on Zhousidun and evaluating on both real-like test data ($mAP$ at $IoU=0.50$) and synthetic scenes, the paper reveals strong in-distribution performance ($mAP=0.926$) but limited out-of-distribution effectiveness ($mAP≈0.45$, recall ≈0.26, precision ≈0.87). The findings highlight data quality and domain-shift limitations when training on web-scraped imagery and demonstrate how synthetic data can bootstrap more robust detectors, informing open-source net assessment. Overall, the work proposes a robust, repeatable framework for evaluating AI-enabled military capabilities using public data and open-source tools, with implications for strategic analysis and force-planning.
Abstract
The integration of artificial intelligence (AI) into military capabilities has become a norm for major military power across the globe. Understanding how these AI models operate is essential for maintaining strategic advantages and ensuring security. This paper demonstrates an open-source methodology for analyzing military AI models through a detailed examination of the Zhousidun dataset, a Chinese-originated dataset that exhaustively labels critical components on American and Allied destroyers. By demonstrating the replication of a state-of-the-art computer vision model on this dataset, we illustrate how open-source tools can be leveraged to assess and understand key military AI capabilities. This methodology offers a robust framework for evaluating the performance and potential of AI-enabled military capabilities, thus enhancing the accuracy and reliability of strategic assessments.
