Frame Semantic Patterns for Identifying Underreporting of Notifiable Events in Healthcare: The Case of Gender-Based Violence
Lívia Dutra, Arthur Lorenzi, Laís Berno, Franciany Campos, Karoline Biscardi, Kenneth Brown, Marcelo Viridiano, Frederico Belcavello, Ely Matos, Olívia Guaranha, Erik Santos, Sofia Reinach, Tiago Timponi Torrent
TL;DR
The paper tackles underreporting of notifiable health events, notably gender-based violence (GBV), in e-medical records by employing frame-semantic modeling with FrameNet Brasil to define domain-specific frames and 8 GBV patterns. It builds and annotates domain frames for Healthcare and Violence, trains a specialized semantic parser, and validates eight GBV patterns on a large Brazilian Portuguese corpus, achieving a precision of $0.726$ and $0.600$ for fine-grained classification. The contributions include a transparent, low-carbon, language-agnostic, task-agnostic pipeline suitable for adaptation to other health surveillance contexts, along with an annotated dataset and error analyses that inform pattern refinement. Practically, the approach enables explainable NLP support for public health authorities to detect underreported GBV signals in resource-constrained settings while safeguarding privacy and ethics.
Abstract
We introduce a methodology for the identification of notifiable events in the domain of healthcare. The methodology harnesses semantic frames to define fine-grained patterns and search them in unstructured data, namely, open-text fields in e-medical records. We apply the methodology to the problem of underreporting of gender-based violence (GBV) in e-medical records produced during patients' visits to primary care units. A total of eight patterns are defined and searched on a corpus of 21 million sentences in Brazilian Portuguese extracted from e-SUS APS. The results are manually evaluated by linguists and the precision of each pattern measured. Our findings reveal that the methodology effectively identifies reports of violence with a precision of 0.726, confirming its robustness. Designed as a transparent, efficient, low-carbon, and language-agnostic pipeline, the approach can be easily adapted to other health surveillance contexts, contributing to the broader, ethical, and explainable use of NLP in public health systems.
