HICH Image/Text (HICH-IT): Comprehensive Text and Image Datasets for Hypertensive Intracerebral Hemorrhage Research
Jie Li, Yulong Xia, Tongxin Yang, Fenglin Cai, Miao Wei, Zhiwei Zhang, Li Jiang
TL;DR
The paper presents HICH-IT, a Chinese multi-modal dataset for hypertensive intracerebral hemorrhage that pairs head CT images with EMR-derived text annotations to support image segmentation and natural language processing for clinical data. It details data sources from multiple centers, four imaging annotations (hematoma, brain midline, left and right ventricles), and BERT-based entity tagging for EMRs, all stored in standard formats (NIfTI for images, TXT for text) and visualizable in 3D-Slicer. Preliminary experiments using U-Net for segmentation and NER models for text demonstrate strong performance, underscoring the dataset’s value for multi-modal AI in medical imaging and clinical text analysis. The authors release pretrained models and the dataset to foster open research and plan ongoing updates to expand data quantity and variety, aiming to improve diagnostic accuracy and treatment efficiency for HICH.
Abstract
In this paper, we introduce a new dataset in the medical field of hypertensive intracerebral hemorrhage (HICH), called HICH-IT, which includes both electronic medical records (EMRs) and head CT images. This dataset is designed to enhance the accuracy of artificial intelligence in the diagnosis and treatment of HICH. This dataset, built upon the foundation of standard text and image data, incorporates specific annotations within the EMRs, extracting key content from the text information, and categorizes the annotation content of imaging data into four types: brain midline, hematoma, left and right cerebral ventricle. HICH-IT aims to be a foundational dataset for feature learning in image segmentation tasks and named entity recognition. To further understand the dataset, we have trained deep learning algorithms to observe the performance. The pretrained models have been released at both www.daip.club and github.com/Deep-AI-Application-DAIP. The dataset has been uploaded to https://github.com/CYBUS123456/HICH-IT-Datasets. Index Terms-HICH, Deep learning, Intraparenchymal hemorrhage, named entity recognition, novel dataset
