PhilHumans: Benchmarking Machine Learning for Personal Health

Vadim Liventsev; Vivek Kumar; Allmin Pradhap Singh Susaiyah; Zixiu Wu; Ivan Rodin; Asfand Yaar; Simone Balloccu; Marharyta Beraziuk; Sebastiano Battiato; Giovanni Maria Farinella; Aki Härmä; Rim Helaoui; Milan Petkovic; Diego Reforgiato Recupero; Ehud Reiter; Daniele Riboni; Raymond Sterling

PhilHumans: Benchmarking Machine Learning for Personal Health

Vadim Liventsev, Vivek Kumar, Allmin Pradhap Singh Susaiyah, Zixiu Wu, Ivan Rodin, Asfand Yaar, Simone Balloccu, Marharyta Beraziuk, Sebastiano Battiato, Giovanni Maria Farinella, Aki Härmä, Rim Helaoui, Milan Petkovic, Diego Reforgiato Recupero, Ehud Reiter, Daniele Riboni, Raymond Sterling

TL;DR

PhilHumans addresses the lack of standardized benchmarks in machine learning for personal health by presenting a holistic suite spanning tabular, vision, and natural language domains across therapy, coaching, and clinical settings. It formalizes datasets and evaluation protocols (e.g., MIMIC-IV-Ext-SEQ, Auto-ALS, PH-Ego, Imagym, AnnoMI, and insight mining) and provides baseline results to anchor future research. The framework enables cross-domain evaluation, reproducibility, and potential transfer between modalities, highlighting both the promise and gaps for real-world deployment. By delivering a reproducible, multi-faceted benchmark, PhilHumans aims to accelerate methodological development and adoption of ML tools in personalized healthcare.

Abstract

The use of machine learning in Healthcare has the potential to improve patient outcomes as well as broaden the reach and affordability of Healthcare. The history of other application areas indicates that strong benchmarks are essential for the development of intelligent systems. We present Personal Health Interfaces Leveraging HUman-MAchine Natural interactions (PhilHumans), a holistic suite of benchmarks for machine learning across different Healthcare settings - talk therapy, diet coaching, emergency care, intensive care, obstetric sonography - as well as different learning settings, such as action anticipation, timeseries modeling, insight mining, language modeling, computer vision, reinforcement learning and program synthesis

PhilHumans: Benchmarking Machine Learning for Personal Health

TL;DR

Abstract

Paper Structure (22 sections, 7 equations, 5 figures, 6 tables)

This paper contains 22 sections, 7 equations, 5 figures, 6 tables.

Introduction
Benchmarks
Tabular perspective
MIMIC-IV-Ext-SEQ
Auto-ALS
Vision perspective
Human-Robot interaction
PH-Ego
Imagym
Natural language perspective
Diet coaching
AnnoMI
Insight mining
Evaluations
MIMIC-IV-Ext-SEQ
...and 7 more sections

Figures (5)

Figure 1: Virtu-ALS
Figure 2: (left) Robot responding to a human call by exploring the environment with the help of multiple global goals (1-6) to locate and reach the human. (center) Robot's observations upon reaching each global goal (1-6) during its exploration of the environment. (right) Robot's final observation upon successfully reaching the human at an appropriate angle, depending on the human's activity, to initiate a conversation.
Figure 3: The example sequence of frames from PH-Ego dataset capturing coffee preparation and adding sugar actions.
Figure 4: Two examples of the agent's observation at different positions of the probe
Figure 5: Challenges in conversations technologies and applications for health self-management.

PhilHumans: Benchmarking Machine Learning for Personal Health

TL;DR

Abstract

PhilHumans: Benchmarking Machine Learning for Personal Health

Authors

TL;DR

Abstract

Table of Contents

Figures (5)