Datasheets for Machine Learning Sensors
Matthew Stewart, Yuke Zhang, Pete Warden, Yasmine Omri, Shvetank Prakash, Jacob Huckelberry, Joao Henrique Santos, Shawn Hymel, Benjamin Yeager Brown, Jim MacArthur, Nat Jeffries, Emanuel Moss, Mona Sloane, Brian Plancher, Vijay Janapa Reddi
TL;DR
ML sensors fuse sensing hardware with embedded ML, creating transparency and auditability requirements. To address this, the authors propose a dedicated datasheet framework for ML sensors organized into Standard Sensor, IoT, AI, and ML Sensor components, developed through stakeholder engagement and open-sourcing. They demonstrate the framework on two person-detection sensors (one open-source, one commercial) to show how end-to-end performance, environmental impact, privacy, and model–data details can be documented. Aligned with FAIR principles, the framework aims to support reproducibility, regulatory compliance, and responsible deployment of edge AI in diverse real-world contexts.
Abstract
Machine learning (ML) is becoming prevalent in embedded AI sensing systems. These "ML sensors" enable context-sensitive, real-time data collection and decision-making across diverse applications ranging from anomaly detection in industrial settings to wildlife tracking for conservation efforts. As such, there is a need to provide transparency in the operation of such ML-enabled sensing systems through comprehensive documentation. This is needed to enable their reproducibility, to address new compliance and auditing regimes mandated in regulation and industry-specific policy, and to verify and validate the responsible nature of their operation. To address this gap, we introduce the datasheet for ML sensors framework. We provide a comprehensive template, collaboratively developed in academia-industry partnerships, that captures the distinct attributes of ML sensors, including hardware specifications, ML model and dataset characteristics, end-to-end performance metrics, and environmental impacts. Our framework addresses the continuous streaming nature of sensor data, real-time processing requirements, and embeds benchmarking methodologies that reflect real-world deployment conditions, ensuring practical viability. Aligned with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability), our approach enhances the transparency and reusability of ML sensor documentation across academic, industrial, and regulatory domains. To show the application of our approach, we present two datasheets: the first for an open-source ML sensor designed in-house and the second for a commercial ML sensor developed by industry collaborators, both performing computer vision-based person detection.
