Latent Factor Models Meets Instructions: Goal-conditioned Latent Factor Discovery without Task Supervision
Zhouhang Xie, Tushar Khot, Bhavana Dalvi Mishra, Harshit Surana, Julian McAuley, Peter Clark, Bodhisattwa Prasad Majumder
TL;DR
Instruct-LF addresses the challenge of discovering goal-aligned, interpretable latent factors from unstructured data by coupling LLM-driven property proposals with gradient-based latent factor modeling. The framework first builds a data-property matrix via a per-point property proposal and a dual-embedding link predictor, then clusters properties into latent factors using Linear Corex, yielding interpretable, task-relevant concepts. Across movie dialogues, Alfworld navigation logs, and American bill documents, Instruct-LF improves downstream task performance and outperforms state-of-the-art baselines, with human evaluators favoring its factors and groupings. The approach reduces reliance on strong LLM reasoning, scales to large noisy datasets, and offers a practical pipeline for goal-conditioned pattern discovery with measurable impact.
Abstract
Instruction-following LLMs have recently allowed systems to discover hidden concepts from a collection of unstructured documents based on a natural language description of the purpose of the discovery (i.e., goal). Still, the quality of the discovered concepts remains mixed, as it depends heavily on LLM's reasoning ability and drops when the data is noisy or beyond LLM's knowledge. We present Instruct-LF, a goal-oriented latent factor discovery system that integrates LLM's instruction-following ability with statistical models to handle large, noisy datasets where LLM reasoning alone falls short. Instruct-LF uses LLMs to propose fine-grained, goal-related properties from documents, estimates their presence across the dataset, and applies gradient-based optimization to uncover hidden factors, where each factor is represented by a cluster of co-occurring properties. We evaluate latent factors produced by Instruct-LF on movie recommendation, text-world navigation, and legal document categorization tasks. These interpretable representations improve downstream task performance by 5-52% than the best baselines and were preferred 1.8 times as often as the best alternative, on average, in human evaluation.
