Ontology for Healthcare Artificial Intelligence Privacy in Brazil
Tiago Andres Vaz, José Miguel Silva Dora, Luís da Cunha Lamb, Suzi Alves Camey
TL;DR
This paper presents ORHBR, an ontology linking Brazil's LGPD with multidisciplinary privacy concepts to semantically represent anonymization of hospital records for AI research. By detailing a seven-step methodology implemented in Protégé, the authors define domain scope, select knowledge, and create classes and relationships that describe study designs, data types, risk types, privacy models, preparation techniques, and performance metrics. Five real-world instances illustrate how ORHBR operationalizes anonymization planning across diverse designs (cross-sectional, cohort, case-control, RCT, prospective), data formats, and privacy defenses, with concrete risk and information-loss metrics. The work aims to standardize privacy discourse, support compliance in Brazil, and enable researchers to compare anonymization approaches while maintaining data utility for epidemiology and AI-enabled health research.
Abstract
This article details the creation of a novel domain ontology at the intersection of epidemiology, medicine, statistics, and computer science. Using the terminology defined by current legislation, the article outlines a systematic approach to handling hospital data anonymously in preparation for its use in Artificial Intelligence (AI) applications in healthcare. The development process consisted of 7 pragmatic steps, including defining scope, selecting knowledge, reviewing important terms, constructing classes that describe designs used in epidemiological studies, machine learning paradigms, types of data and attributes, risks that anonymized data may be exposed to, privacy attacks, techniques to mitigate re-identification, privacy models, and metrics for measuring the effects of anonymization. The article concludes by demonstrating the practical implementation of this ontology in hospital settings for the development and validation of AI.
