A multi-language toolkit for the semi-automated checking of research outputs

Richard J. Preen; Maha Albashir; Simon Davy; Jim Smith

A multi-language toolkit for the semi-automated checking of research outputs

Richard J. Preen, Maha Albashir, Simon Davy, Jim Smith

TL;DR

The paper tackles the challenge of scaling safe research outputs in trusted data environments by introducing SACRO, an open-source toolkit that semi-automates statistical disclosure control through a principles-based framework. The core ACRO Python package performs automated checks and optional mitigations on outputs (tables, plots, models) while a GUI (SACRO Viewer) provides auditable, researcher-friendly review and decision-tracking; cross-language wrappers for R and Stata preserve a consistent back-end. Key contributions include a lightweight, extensible design with YAML-based risk configuration, explicit reporting of why outputs are disclosive, and a workflow that shifts SDC from a hand-off to a collaborative process between researchers and TRE staff. The approach aims to reduce staff workload, improve consistency across TREs, and enable more efficient, auditable, and transparent data-release processes without compromising data privacy.

Abstract

This article presents a free and open source toolkit that supports the semi-automated checking of research outputs (SACRO) for privacy disclosure within secure data environments. SACRO is a framework that applies best-practice principles-based statistical disclosure control (SDC) techniques on-the-fly as researchers conduct their analyses. SACRO is designed to assist human checkers rather than seeking to replace them as with current automated rules-based approaches. The toolkit is composed of a lightweight Python package that sits over well-known analysis tools that produce outputs such as tables, plots, and statistical models. This package adds functionality to (i) automatically identify potentially disclosive outputs against a range of commonly used disclosure tests; (ii) apply optional disclosure mitigation strategies as requested; (iii) report reasons for applying SDC; and (iv) produce simple summary documents trusted research environment staff can use to streamline their workflow and maintain auditable records. This creates an explicit change in the dynamics so that SDC is something done with researchers rather than to them, and enables more efficient communication with checkers. A graphical user interface supports human checkers by displaying the requested output and results of the checks in an immediately accessible format, highlighting identified issues, potential mitigation options, and tracking decisions made. The major analytical programming languages used by researchers (Python, R, and Stata) are supported by providing front-end packages that interface with the core Python back-end. Source code, packages, and documentation are available under MIT license at https://github.com/AI-SDC/ACRO

A multi-language toolkit for the semi-automated checking of research outputs

TL;DR

Abstract

A multi-language toolkit for the semi-automated checking of research outputs

Authors

TL;DR

Abstract

Table of Contents

Figures (1)