From PhysioNet to Foundation Models -- A history and potential futures

Gari D. Clifford

From PhysioNet to Foundation Models -- A history and potential futures

Gari D. Clifford

TL;DR

This article identifies the most promising future directions for the PhysioNet Resource, and more generally, the growing issues and opportunities around dissemination and use of massive physiological databases, associated open access code, and public competitions, along with potential solutions to the key issues facing the field.

Abstract

Over the last 35 years, the sharing of medical data and models for research has evolved from sneakernet to the internet - from mailing magnetic tapes and compact discs of a handful of well-curated recordings, to the high-speed download of relatively comprehensive hospital databases. More recently, the fervor around the potential for modern machine learning and 'AI' to catapult us into the next industrial revolution has led to a seemingly insatiable desire to pump almost any source of data into large models. Although this has great potential, it also presents a whole set of new challenges. In this article I examine these trends over the last 30 years, drawing on examples from cardiology, one of the oldest data-intensive fields that is undergoing a renaissance via machine learning. From the early days of computerized cardiology, the Research Resource for Complex Physiologic Signals (PhysioNet) has been at the cutting edge of this field. This article, therefore, includes much of the Resource's history and the contributions drawn from 25 years of firsthand experience of co-developing elements of the Resource with its founders. I identify the most promising future directions for the PhysioNet Resource, and more generally, the growing issues and opportunities around dissemination and use of massive physiological databases, associated open access code, and public competitions, along with potential solutions to the key issues facing our field. Topics range from how we should approach foundation models in the context of the rapidly growing AI carbon footprint, to the potential of Tiny-ML and edge computing. I also cover issues around prizes and incentives, funding models, and scientific repeatability, as well as how we might address these issues by leveraging the PhysioNet Challenges, consistent with the philosophy of open-access from the early days of the PhysioNet Resource.

From PhysioNet to Foundation Models -- A history and potential futures

TL;DR

Abstract

Paper Structure (21 sections, 6 figures, 2 tables)

This paper contains 21 sections, 6 figures, 2 tables.

Introduction
Some history on open access science
The PhysioNet Resource
The PhysioNet Challenges
Comparisons with other open biomedical data initiatives
The KDD Cup
Sage Bionetworks DREAM Challenges
Kaggle
Prizes and Incentives
Tutorials, publishing, trust, and misaligned incentives
Recent Developments using 'AI'
Foundation and Empire - the New Colonialism or Democratization of ML
The Carbon Footprint of Success
A Quick Aside on Interpretability
The Future? Large Models, Public Data and the role of PhysioNet and public other resources
...and 6 more sections

Figures (6)

Figure 1: The MIT-BIH Arrhythmia Database on CDROM. Photo courtesy of Juan Pablo Martínez.
Figure 2: Roger Mark and George Moody in the early days of the development of the MIT-BIH Arrhythmia Database. Source: courtesy of George Moody and Roger Mark. CC BY-SA 4.0.
Figure 3: George Moody showing me some of the historical artifacts from his office that predated PhysioNet. Source: Author's own photo. CC BY-SA 4.0.
Figure 4: The 'Faces of PhysioNet' flyer used in the early 2000's, capturing the key contributors in the early years of the Resource. Image generated by Ken Pierce in the LCP. CC BY-SA 4.0.
Figure 5: The original homepage of PhysioNet.org, captured from the Internet Archive Wayback Machine on Dec 4, 1999. CC BY-SA 4.0.
...and 1 more figures

From PhysioNet to Foundation Models -- A history and potential futures

TL;DR

Abstract

From PhysioNet to Foundation Models -- A history and potential futures

Authors

TL;DR

Abstract

Table of Contents

Figures (6)