Table of Contents
Fetching ...

Multi-Integration of Labels across Categories for Component Identification (MILCCI)

Noga Mudrik, Yuxi Chen, Gal Mishne, Adam S. Charles

TL;DR

MILCCI addresses the challenge of mapping multi-category trial labels to high-dimensional time-series by learning category-specific component dictionaries with label-conditioned variants and trial-specific temporal traces. It decomposes observations as a sum over categories with variant-aware loadings, and optimizes via a three-stage fitting process that enforces sparsity and label-distance consistency while smoothing temporal trajectories. Across synthetic data and diverse real-world datasets (voting histories, Wikipedia pageviews, and multi-region neural recordings), MILCCI demonstrates improved recoverability of underlying components and interpretable, label-aware patterns, outperforming conventional tensor and matrix decompositions. The framework enables flexible, cross-trial analysis that separates label-driven structure from non-label-driven variability, with potential extensions to non-linear dynamics and multi-modal data.

Abstract

Many fields collect large-scale temporal data through repeated measurements (trials), where each trial is labeled with a set of metadata variables spanning several categories. For example, a trial in a neuroscience study may be linked to a value from category (a): task difficulty, and category (b): animal choice. A critical challenge in time-series analysis is to understand how these labels are encoded within the multi-trial observations, and disentangle the distinct effect of each label entry across categories. Here, we present MILCCI, a novel data-driven method that i) identifies the interpretable components underlying the data, ii) captures cross-trial variability, and iii) integrates label information to understand each category's representation within the data. MILCCI extends a sparse per-trial decomposition that leverages label similarities within each category to enable subtle, label-driven cross-trial adjustments in component compositions and to distinguish the contribution of each category. MILCCI also learns each component's corresponding temporal trace, which evolves over time within each trial and varies flexibly across trials. We demonstrate MILCCI's performance through both synthetic and real-world examples, including voting patterns, online page view trends, and neuronal recordings.

Multi-Integration of Labels across Categories for Component Identification (MILCCI)

TL;DR

MILCCI addresses the challenge of mapping multi-category trial labels to high-dimensional time-series by learning category-specific component dictionaries with label-conditioned variants and trial-specific temporal traces. It decomposes observations as a sum over categories with variant-aware loadings, and optimizes via a three-stage fitting process that enforces sparsity and label-distance consistency while smoothing temporal trajectories. Across synthetic data and diverse real-world datasets (voting histories, Wikipedia pageviews, and multi-region neural recordings), MILCCI demonstrates improved recoverability of underlying components and interpretable, label-aware patterns, outperforming conventional tensor and matrix decompositions. The framework enables flexible, cross-trial analysis that separates label-driven structure from non-label-driven variability, with potential extensions to non-linear dynamics and multi-modal data.

Abstract

Many fields collect large-scale temporal data through repeated measurements (trials), where each trial is labeled with a set of metadata variables spanning several categories. For example, a trial in a neuroscience study may be linked to a value from category (a): task difficulty, and category (b): animal choice. A critical challenge in time-series analysis is to understand how these labels are encoded within the multi-trial observations, and disentangle the distinct effect of each label entry across categories. Here, we present MILCCI, a novel data-driven method that i) identifies the interpretable components underlying the data, ii) captures cross-trial variability, and iii) integrates label information to understand each category's representation within the data. MILCCI extends a sparse per-trial decomposition that leverages label similarities within each category to enable subtle, label-driven cross-trial adjustments in component compositions and to distinguish the contribution of each category. MILCCI also learns each component's corresponding temporal trace, which evolves over time within each trial and varies flexibly across trials. We demonstrate MILCCI's performance through both synthetic and real-world examples, including voting patterns, online page view trends, and neuronal recordings.
Paper Structure (37 sections, 10 equations, 38 figures, 4 tables, 1 algorithm)

This paper contains 37 sections, 10 equations, 38 figures, 4 tables, 1 algorithm.

Figures (38)

  • Figure 1: Illustration.I: Time-series (e.g., brain recordings) across $M$ trials of varying duration ($\{T^{(m)}\}_{m=1}^M$). Each trial $m$ is associated with a label $L^{(m)}$, which is a set of experimental variables spanning different categories (e.g., $L^{(m)} = (\text{easy task}, \text{correct choice})$). II: Each category $\text{(k)}$'s components are represented by a tensor $\mathcal{A}^{\text{(k)}}$, whose $i$-th variant ($\mathcal{A}^{\text{(k)}}_{::i}$) refers to the $i$-th option of that category (e.g., if the $2$-nd option of category (b): correct choice, then $\mathcal{A}^{\text{(b)}}_{::2}$ are correct-choice components). III: Each trial $m$ is modeled via a sparse factorization, with its sparse components defined by selecting a variant (layer) from each category’s tensor, based on that trial's label (green borders, II), and then concatenating all selected variants horizontally (green borders, III). This forms the loading matrix of that trial. Importantly: 1) trials with identical labels use identical loadings, 2) components can subtly adjust their composition under shifts in the respective category values to maintain consistency (e.g., same component under task difficulty 1 vs. 2: $\|\mathcal{A}^\text{(a)}_{::1}-\mathcal{A}^\text{(a)}_{::2}\|_F < \epsilon$), and 3) component temporal traces ($\{\bm{\Phi}^{(m)}\}_{m=1}^M$) can vary flexibly across trials.
  • Figure 2: MILCCI Recovers True Representations in Synthetic Data.A-B: Generated synthetic data (full data in Fig. \ref{['fig:synthetic_supp']}). Ground-truth components (examples in panel A) vary slightly across labels but remain fixed across same-label trials (rows). Ground-truth traces vary across trials (B, colored by difficulty). C-D: Identified vs. ground-truth components (C) and time-traces (D) for random trials. E: Histogram of correlations between identified components and traces vs. their true counterparts. F-G: Comparison of MILCCI to other methods (limited to the same $4$-component dimension) based on traces (random trial, F) and reconstruction performance (G, baselines details in Sec. \ref{['sec:baselines_info']}).
  • Figure 3: Voting Results: Identified example components (A) and traces (B, Mean and $80\%$ confidence interval). See full in Fig. \ref{['fig:Voting_Data']}.
  • Figure 4: Identified Wiki page-view example components (A) and traces averaged by different categories (B).
  • Figure 5: MILCCI identifies meaningful neuronal ensembles in real-world brain data.A: Experimental setting (from international2025brain). B: Component traces (all 1011 trials) sorted horizontally by trial correctness (left panel) and $\text{Prob}(\text{left})$ (right panel). C: Average within-trial values of exemplary traces reveal varying degrees of temporal drifts over trials. D: Ensembles identified (example trial). E: Differences in ensemble composition across trials. F: Trial-adjusted ensemble compositions over first $250$ trials.
  • ...and 33 more figures