Table of Contents
Fetching ...

Mapping of Subjective Accounts into Interpreted Clusters (MOSAIC): Topic Modelling and LLM applied to Stroboscopic Phenomenology

Romy Beauté, David J. Schwartzman, Guillaume Dumas, Jennifer Crook, Fiona Macpherson, Adam B. Barrett, Anil K. Seth

TL;DR

The paper addresses the limitation of predefined questionnaires in capturing the full range of stroboscopically induced phenomenology by analyzing open-ended Dreamachine reports with a data-driven MOSAIC pipeline. It combines BERTopic topic modelling on sentence embeddings with Llama-3-8B-Instruct automatic labeling to identify latent experiential topics from 862 sentences across two Dreamachine variants. The HS and DL analyses reveal a spectrum from simple visual halluci nations to complex imagery and altered states, including substantial unassigned responses that underscore idiosyncratic experiences. The work demonstrates a practical, open-source workflow for analyzing subjective reports and highlights the potential to map phenomenological categories to neural data in future neurophenomenology research.

Abstract

Stroboscopic light stimulation (SLS) on closed eyes typically induces simple visual hallucinations (VHs), characterised by vivid, geometric and colourful patterns. A dataset of 862 sentences, extracted from 422 open subjective reports, was recently compiled as part of the Dreamachine programme (Collective Act, 2022), an immersive multisensory experience that combines SLS and spatial sound in a collective setting. Although open reports extend the range of reportable phenomenology, their analysis presents significant challenges, particularly in systematically identifying patterns. To address this challenge, we implemented a data-driven approach leveraging Large Language Models and Topic Modelling to uncover and interpret latent experiential topics directly from the Dreamachine's text-based reports. Our analysis confirmed the presence of simple VHs typically documented in scientific studies of SLS, while also revealing experiences of altered states of consciousness and complex hallucinations. Building on these findings, our computational approach expands the systematic study of subjective experience by enabling data-driven analyses of open-ended phenomenological reports, capturing experiences not readily identified through standard questionnaires. By revealing rich and multifaceted aspects of experiences, our study broadens our understanding of stroboscopically-induced phenomena while highlighting the potential of Natural Language Processing and Large Language Models in the emerging field of computational (neuro)phenomenology. More generally, this approach provides a practically applicable methodology for uncovering subtle hidden patterns of subjective experience across diverse research domains.

Mapping of Subjective Accounts into Interpreted Clusters (MOSAIC): Topic Modelling and LLM applied to Stroboscopic Phenomenology

TL;DR

The paper addresses the limitation of predefined questionnaires in capturing the full range of stroboscopically induced phenomenology by analyzing open-ended Dreamachine reports with a data-driven MOSAIC pipeline. It combines BERTopic topic modelling on sentence embeddings with Llama-3-8B-Instruct automatic labeling to identify latent experiential topics from 862 sentences across two Dreamachine variants. The HS and DL analyses reveal a spectrum from simple visual halluci nations to complex imagery and altered states, including substantial unassigned responses that underscore idiosyncratic experiences. The work demonstrates a practical, open-source workflow for analyzing subjective reports and highlights the potential to map phenomenological categories to neural data in future neurophenomenology research.

Abstract

Stroboscopic light stimulation (SLS) on closed eyes typically induces simple visual hallucinations (VHs), characterised by vivid, geometric and colourful patterns. A dataset of 862 sentences, extracted from 422 open subjective reports, was recently compiled as part of the Dreamachine programme (Collective Act, 2022), an immersive multisensory experience that combines SLS and spatial sound in a collective setting. Although open reports extend the range of reportable phenomenology, their analysis presents significant challenges, particularly in systematically identifying patterns. To address this challenge, we implemented a data-driven approach leveraging Large Language Models and Topic Modelling to uncover and interpret latent experiential topics directly from the Dreamachine's text-based reports. Our analysis confirmed the presence of simple VHs typically documented in scientific studies of SLS, while also revealing experiences of altered states of consciousness and complex hallucinations. Building on these findings, our computational approach expands the systematic study of subjective experience by enabling data-driven analyses of open-ended phenomenological reports, capturing experiences not readily identified through standard questionnaires. By revealing rich and multifaceted aspects of experiences, our study broadens our understanding of stroboscopically-induced phenomena while highlighting the potential of Natural Language Processing and Large Language Models in the emerging field of computational (neuro)phenomenology. More generally, this approach provides a practically applicable methodology for uncovering subtle hidden patterns of subjective experience across diverse research domains.

Paper Structure

This paper contains 22 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Topic modelling pipeline architecture comprising three phases: text processing (NLTK for sentence segmentation, SBERT for dense embeddings), topic modelling (UMAP for dimensionality reduction, HDBSCAN for clustering, Count Vectorisation and c-TF-IDF for term weighting), and topic refinement (Coherence Metrics for quality assessment, Probability Threshold for outlier handling, Llama 3 for label generation)
  • Figure 2: Topic Representations of HS Dreamachine experiences Two-dimensional embedding visualisation of experiential topics (n=13) derived from HS Dreamachine participant reflections (n=680 sentences). Each point represents a sentence, colour-coded by its dominant topic. Spatial proximity indicates semantic similarity between reports, computed using BERTopic’s transformer-based embeddings. The visualisation reveals distinct phenomenological domains: visual-perceptual (turquoise-blue), altered states of consciousness (purple), and emotional-meditative experiences (red-orange), and multisensory/autobiographical experiences (green). Topic labels were generated using Llama-3-8b-instruct, which interpreted the underlying semantic clusters. Points are clustered by semantic similarity, with overlapping regions suggesting shared phenomenological features.
  • Figure 3: Hierarchical clustering of HS Dreamachine experiences Dendrogram showing semantic relationships between 13 topics identified from HS Dreamachine reflections. Topics were extracted using BERTopic and labelled with Llama-3-8b-instruct. Vertical distances represent cosine dissimilarity between topics. The clustering reveals distinct phenomenological domains: autobiographical-spiritual experiences (green cluster: including childhood memories, spiritual experiences, and multisensory experiences), visual phenomena (red cluster: optical patterns, visual hallucinations) and altered states experiences (blue-turquoise cluster: psychedelic experiences, mindfulness meditation, peaceful states, lucid dreaming). This hierarchical view complements the spatial relationships shown in Figure \ref{['fig:HSfig']}
  • Figure 4: Topic Representations of DL Dreamachine experiences Two-dimensional embedding visualisation of experiential topics (n=7) derived from DL Dreamachine participant reflections (n=182 sentences). Each point represents a sentence, colour-coded by its dominant topic. Spatial proximity indicates semantic similarity between reports, computed using BERTopic's transformer-based embeddings. The visualisation reveals distinct experiential domains: sensory-perceptual (purple), mental imagery (red-orange), mindfulness states (green-blue), and out-of-body experiences (grey). Topic labels were generated using Llama-3-8b-instruct, which interpreted the underlying semantic clusters. Points are clustered by semantic similarity, with overlapping regions suggesting shared phenomenological features.
  • Figure 5: Hierarchical clustering of DL Dreamachine experiences Dendrogram showing semantic relationships between 7 topics identified from DL Dreamachine reflections. Topics were extracted using BERTopic and labelled with Llama-3-8b-instruct.