PhilHumans: Benchmarking Machine Learning for Personal Health
Vadim Liventsev, Vivek Kumar, Allmin Pradhap Singh Susaiyah, Zixiu Wu, Ivan Rodin, Asfand Yaar, Simone Balloccu, Marharyta Beraziuk, Sebastiano Battiato, Giovanni Maria Farinella, Aki Härmä, Rim Helaoui, Milan Petkovic, Diego Reforgiato Recupero, Ehud Reiter, Daniele Riboni, Raymond Sterling
TL;DR
PhilHumans addresses the lack of standardized benchmarks in machine learning for personal health by presenting a holistic suite spanning tabular, vision, and natural language domains across therapy, coaching, and clinical settings. It formalizes datasets and evaluation protocols (e.g., MIMIC-IV-Ext-SEQ, Auto-ALS, PH-Ego, Imagym, AnnoMI, and insight mining) and provides baseline results to anchor future research. The framework enables cross-domain evaluation, reproducibility, and potential transfer between modalities, highlighting both the promise and gaps for real-world deployment. By delivering a reproducible, multi-faceted benchmark, PhilHumans aims to accelerate methodological development and adoption of ML tools in personalized healthcare.
Abstract
The use of machine learning in Healthcare has the potential to improve patient outcomes as well as broaden the reach and affordability of Healthcare. The history of other application areas indicates that strong benchmarks are essential for the development of intelligent systems. We present Personal Health Interfaces Leveraging HUman-MAchine Natural interactions (PhilHumans), a holistic suite of benchmarks for machine learning across different Healthcare settings - talk therapy, diet coaching, emergency care, intensive care, obstetric sonography - as well as different learning settings, such as action anticipation, timeseries modeling, insight mining, language modeling, computer vision, reinforcement learning and program synthesis
