Table of Contents
Fetching ...

All-in-one platform for AI R&D in medical imaging, encompassing data collection, selection, annotation, and pre-processing

Changhee Han, Kyohei Shibano, Wataru Ozaki, Keishiro Osaki, Takafumi Haraguchi, Daisuke Hirahara, Shumon Kimura, Yasuyuki Kobayashi, Gento Mogi

TL;DR

This work established the first commercial medical imaging platform, preparing/providing ready-to-use datasets for medical AI R&D by offering these datasets to companies and using them as additional training data to develop tailored AI solutions.

Abstract

Deep Learning is advancing medical imaging Research and Development (R&D), leading to the frequent clinical use of Artificial Intelligence/Machine Learning (AI/ML)-based medical devices. However, to advance AI R&D, two challenges arise: 1) significant data imbalance, with most data from Europe/America and under 10% from Asia, despite its 60% global population share; and 2) hefty time and investment needed to curate proprietary datasets for commercial use. In response, we established the first commercial medical imaging platform, encompassing steps like: 1) data collection, 2) data selection, 3) annotation, and 4) pre-processing. Moreover, we focus on harnessing under-represented data from Japan and broader Asia, including Computed Tomography, Magnetic Resonance Imaging, and Whole Slide Imaging scans. Using the collected data, we are preparing/providing ready-to-use datasets for medical AI R&D by 1) offering these datasets to AI firms, biopharma, and medical device makers and 2) using them as training/test data to develop tailored AI solutions for such entities. We also aim to merge Blockchain for data security and plan to synthesize rare disease data via generative AI. DataHub Website: https://medical-datahub.ai/

All-in-one platform for AI R&D in medical imaging, encompassing data collection, selection, annotation, and pre-processing

TL;DR

This work established the first commercial medical imaging platform, preparing/providing ready-to-use datasets for medical AI R&D by offering these datasets to companies and using them as additional training data to develop tailored AI solutions.

Abstract

Deep Learning is advancing medical imaging Research and Development (R&D), leading to the frequent clinical use of Artificial Intelligence/Machine Learning (AI/ML)-based medical devices. However, to advance AI R&D, two challenges arise: 1) significant data imbalance, with most data from Europe/America and under 10% from Asia, despite its 60% global population share; and 2) hefty time and investment needed to curate proprietary datasets for commercial use. In response, we established the first commercial medical imaging platform, encompassing steps like: 1) data collection, 2) data selection, 3) annotation, and 4) pre-processing. Moreover, we focus on harnessing under-represented data from Japan and broader Asia, including Computed Tomography, Magnetic Resonance Imaging, and Whole Slide Imaging scans. Using the collected data, we are preparing/providing ready-to-use datasets for medical AI R&D by 1) offering these datasets to AI firms, biopharma, and medical device makers and 2) using them as training/test data to develop tailored AI solutions for such entities. We also aim to merge Blockchain for data security and plan to synthesize rare disease data via generative AI. DataHub Website: https://medical-datahub.ai/
Paper Structure (6 sections, 3 figures)

This paper contains 6 sections, 3 figures.

Figures (3)

  • Figure 1: Overview of our business ecosystem: hospitals and clinics provide anonymized medical images and clinical data, possibly with their annotation, to a cloud platform; then, Callisto meticulously selects, annotates (with an expert radiologist/pathologist's rigorous double-check), and pre-processes them to prepare ready-to-use datasets for medical AI R&D; finally, Callisto 1) offers these datasets to AI companies, biopharma, and medical device manufacturers, and 2) leverages them as training/test data to develop customized AI solutions for such entities.
  • Figure 2: (a) Percentage of imaging biobanks per continent Gabelloni22; (b) Count of AI/ML-based medical devices per country. Although recent data for Europe is unavailable, as of 2019, there were more CE-marked devices than FDA-cleared ones.
  • Figure 3: Preparation process of medical imaging datasets for AI training/test: 1) data collection vallieres17; 2) data selection rampa2020; 3) annotation Cheng2019; 4) pre-processing deFarias22.