Mapping fNIRS Signals to Agent Performance: Toward Reinforcement Learning from Neural Feedback
Julia Santaniello, Matthew Russell, Benson Jiang, Donatello Sassaroli, Robert Jacob, Jivko SInapov
TL;DR
This work addresses learning agent behavior from implicit human signals by mapping fNIRS-derived neural activity to agent performance during human-agent interactions. It proposes NEURO-LOOP, a passive BCI framework that leverages fNIRS for RLHF-style training and releases a synchronized neural/task dataset across three OpenAI Gym domains. Classifiers predict discrete performance levels with average $F1$ scores of $0.67$ (binary) and $0.46$ (multi-class), while regressors quantify continuous deviation from near-optimal policies; fine-tuning with limited subject data provides substantial gains ($+17\%$ binary, $+41\%$ multi-class). Results show feasibility but limited cross-subject transfer, underscoring the need for calibration and further work on real-time brain-driven RLHF and multi-modal signals.
Abstract
Reinforcement Learning from Human Feedback (RLHF) is a methodology that aligns agent behavior with human preferences by integrating human feedback into the agent's training process. We introduce a possible framework that employs passive Brain-Computer Interfaces (BCI) to guide agent training from implicit neural signals. We present and release a novel dataset of functional near-infrared spectroscopy (fNIRS) recordings collected from 25 human participants across three domains: a Pick-and-Place Robot, Lunar Lander, and Flappy Bird. We train classifiers to predict levels of agent performance (optimal, sub-optimal, or worst-case) from windows of preprocessed fNIRS feature vectors, achieving an average F1 score of 67% for binary classification and 46% for multi-class models averaged across conditions and domains. We also train regressors to predict the degree of deviation between an agent's chosen action and a set of near-optimal policies, providing a continuous measure of performance. We evaluate cross-subject generalization and demonstrate that fine-tuning pre-trained models with a small sample of subject-specific data increases average F1 scores by 17% and 41% for binary and multi-class models, respectively. Our work demonstrates that mapping implicit fNIRS signals to agent performance is feasible and can be improved, laying the foundation for future brain-driven RLHF systems.
