Table of Contents
Fetching ...

Infinite hierarchical contrastive clustering for personal digital envirotyping

Ya-Yun Huang, Joseph McClernon, Jason A. Oliver, Matthew M. Engelhard

TL;DR

This work tackles the challenge of envirotyping by automatically clustering daily-environment images into an unbounded set of personal environments and their higher-level types. It introduces infinite hierarchical contrastive clustering (IH-CC), combining a stick-breaking prior on cluster probabilities with a participant-specific head to induce meaningful intra- and inter-participant structure, trained end-to-end with a composite loss. On two cohorts, IH-CC identifies coherent environment clusters, reveals environment-type groupings shared across participants, and links environment clusters to smoking-related health outcomes, illustrating the method's potential to advance envirotyping. The approach offers a scalable, data-driven pathway to quantify how daily environments influence health and behavior, enabling environment-aware interventions.

Abstract

Daily environments have profound influence on our health and behavior. Recent work has shown that digital envirotyping, where computer vision is applied to images of daily environments taken during ecological momentary assessment (EMA), can be used to identify meaningful relationships between environmental features and health outcomes of interest. To systematically study such effects on an individual level, it is helpful to group images into distinct environments encountered in an individual's daily life; these may then be analyzed, further grouped into related environments with similar features, and linked to health outcomes. Here we introduce infinite hierarchical contrastive clustering to address this challenge. Building on the established contrastive clustering framework, our method a) allows an arbitrary number of clusters without requiring the full Dirichlet Process machinery by placing a stick-breaking prior on predicted cluster probabilities; and b) encourages distinct environments to form well-defined sub-clusters within each cluster of related environments by incorporating a participant-specific prediction loss. Our experiments show that our model effectively identifies distinct personal environments and groups these environments into meaningful environment types. We then illustrate how the resulting clusters can be linked to various health outcomes, highlighting the potential of our approach to advance the envirotyping paradigm.

Infinite hierarchical contrastive clustering for personal digital envirotyping

TL;DR

This work tackles the challenge of envirotyping by automatically clustering daily-environment images into an unbounded set of personal environments and their higher-level types. It introduces infinite hierarchical contrastive clustering (IH-CC), combining a stick-breaking prior on cluster probabilities with a participant-specific head to induce meaningful intra- and inter-participant structure, trained end-to-end with a composite loss. On two cohorts, IH-CC identifies coherent environment clusters, reveals environment-type groupings shared across participants, and links environment clusters to smoking-related health outcomes, illustrating the method's potential to advance envirotyping. The approach offers a scalable, data-driven pathway to quantify how daily environments influence health and behavior, enabling environment-aware interventions.

Abstract

Daily environments have profound influence on our health and behavior. Recent work has shown that digital envirotyping, where computer vision is applied to images of daily environments taken during ecological momentary assessment (EMA), can be used to identify meaningful relationships between environmental features and health outcomes of interest. To systematically study such effects on an individual level, it is helpful to group images into distinct environments encountered in an individual's daily life; these may then be analyzed, further grouped into related environments with similar features, and linked to health outcomes. Here we introduce infinite hierarchical contrastive clustering to address this challenge. Building on the established contrastive clustering framework, our method a) allows an arbitrary number of clusters without requiring the full Dirichlet Process machinery by placing a stick-breaking prior on predicted cluster probabilities; and b) encourages distinct environments to form well-defined sub-clusters within each cluster of related environments by incorporating a participant-specific prediction loss. Our experiments show that our model effectively identifies distinct personal environments and groups these environments into meaningful environment types. We then illustrate how the resulting clusters can be linked to various health outcomes, highlighting the potential of our approach to advance the envirotyping paradigm.

Paper Structure

This paper contains 23 sections, 3 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: IH-CC framework. The diagram is adapted and modified from the CC diagram by li2021contrastive
  • Figure 2: Individual Clusters for P001
  • Figure 3: Example of inter-participant clusters (rows) and intra-participant clusters (divided within rows)
  • Figure 4: NMI across participants
  • Figure 5: Including the PSH increases the silhouette score (left) and Dunn index (right) of participant-specific subclusters, indicating tighter clustering of distinct environments.