Table of Contents
Fetching ...

Mapping fNIRS Signals to Agent Performance: Toward Reinforcement Learning from Neural Feedback

Julia Santaniello, Matthew Russell, Benson Jiang, Donatello Sassaroli, Robert Jacob, Jivko SInapov

TL;DR

This work addresses learning agent behavior from implicit human signals by mapping fNIRS-derived neural activity to agent performance during human-agent interactions. It proposes NEURO-LOOP, a passive BCI framework that leverages fNIRS for RLHF-style training and releases a synchronized neural/task dataset across three OpenAI Gym domains. Classifiers predict discrete performance levels with average $F1$ scores of $0.67$ (binary) and $0.46$ (multi-class), while regressors quantify continuous deviation from near-optimal policies; fine-tuning with limited subject data provides substantial gains ($+17\%$ binary, $+41\%$ multi-class). Results show feasibility but limited cross-subject transfer, underscoring the need for calibration and further work on real-time brain-driven RLHF and multi-modal signals.

Abstract

Reinforcement Learning from Human Feedback (RLHF) is a methodology that aligns agent behavior with human preferences by integrating human feedback into the agent's training process. We introduce a possible framework that employs passive Brain-Computer Interfaces (BCI) to guide agent training from implicit neural signals. We present and release a novel dataset of functional near-infrared spectroscopy (fNIRS) recordings collected from 25 human participants across three domains: a Pick-and-Place Robot, Lunar Lander, and Flappy Bird. We train classifiers to predict levels of agent performance (optimal, sub-optimal, or worst-case) from windows of preprocessed fNIRS feature vectors, achieving an average F1 score of 67% for binary classification and 46% for multi-class models averaged across conditions and domains. We also train regressors to predict the degree of deviation between an agent's chosen action and a set of near-optimal policies, providing a continuous measure of performance. We evaluate cross-subject generalization and demonstrate that fine-tuning pre-trained models with a small sample of subject-specific data increases average F1 scores by 17% and 41% for binary and multi-class models, respectively. Our work demonstrates that mapping implicit fNIRS signals to agent performance is feasible and can be improved, laying the foundation for future brain-driven RLHF systems.

Mapping fNIRS Signals to Agent Performance: Toward Reinforcement Learning from Neural Feedback

TL;DR

This work addresses learning agent behavior from implicit human signals by mapping fNIRS-derived neural activity to agent performance during human-agent interactions. It proposes NEURO-LOOP, a passive BCI framework that leverages fNIRS for RLHF-style training and releases a synchronized neural/task dataset across three OpenAI Gym domains. Classifiers predict discrete performance levels with average scores of (binary) and (multi-class), while regressors quantify continuous deviation from near-optimal policies; fine-tuning with limited subject data provides substantial gains ( binary, multi-class). Results show feasibility but limited cross-subject transfer, underscoring the need for calibration and further work on real-time brain-driven RLHF and multi-modal signals.

Abstract

Reinforcement Learning from Human Feedback (RLHF) is a methodology that aligns agent behavior with human preferences by integrating human feedback into the agent's training process. We introduce a possible framework that employs passive Brain-Computer Interfaces (BCI) to guide agent training from implicit neural signals. We present and release a novel dataset of functional near-infrared spectroscopy (fNIRS) recordings collected from 25 human participants across three domains: a Pick-and-Place Robot, Lunar Lander, and Flappy Bird. We train classifiers to predict levels of agent performance (optimal, sub-optimal, or worst-case) from windows of preprocessed fNIRS feature vectors, achieving an average F1 score of 67% for binary classification and 46% for multi-class models averaged across conditions and domains. We also train regressors to predict the degree of deviation between an agent's chosen action and a set of near-optimal policies, providing a continuous measure of performance. We evaluate cross-subject generalization and demonstrate that fine-tuning pre-trained models with a small sample of subject-specific data increases average F1 scores by 17% and 41% for binary and multi-class models, respectively. Our work demonstrates that mapping implicit fNIRS signals to agent performance is feasible and can be improved, laying the foundation for future brain-driven RLHF systems.

Paper Structure

This paper contains 9 sections, 3 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: NEURO-LOOP: A high-level diagram of the proposed NEURO-LOOP framework. This paper addresses the neural classification problem with the intent to apply our trained models to the proposed pipeline.
  • Figure 2: Setup: Participants sat 24 inches away from a computer screen. The fNIRS device is a headband that shines pulsating infrared light into the PfC to detect changes in blood flow.
  • Figure 3: Training Paradigms: Diagram of the three training paradigms and Experimental Protocol. a) Single-subject models are trained on a set of data from one participant and are evaluated using withheld data from the same participant. b) Multi-subject models are trained on a set of participants and are evaluated using withheld data from the same set. c) Fine-tuned models are multi-subject models calibrated with a fraction of a target participant's data.
  • Figure 4: Multi-Subject Classification Performance: This figure illustrates classification performance (F1) for binary and multi-class models. Binary models attempted to classify optimal vs. sub-optimal behavior, while multi-class models attempted to differentiate between optimal, sub-optimal or worst-case agent behavior.
  • Figure 5: Agents and Domains: Robot Fetch and Place, Flappy Bird and Lunar Lander, respectively.
  • ...and 3 more figures