A Model for Combinatorial Dictionary Learning and Inference
Avrim Blum, Kavya Ravichandran
TL;DR
The paper introduces a combinatorial, occlusion-based model for dictionary learning on 1D canvases, defining well-structured object families to enable learnability and segmentation guarantees. It develops a shotgun-sequencing inspired learning approach to recover the dictionary and presents inference algorithms (DP for arbitrary objects and a greedy signature-based method for well-structured objects) that yield correct explanations under various noise and structural assumptions. It analyzes sample complexity, proves correctness under the well-structuredness assumptions, and demonstrates robustness to adversarial corruption with epsilon-strong structures; it also shows that learning without any assumptions is computationally intractable (NP-hard). The framework provides a principled way to study non-linear, combinatorial dictionary learning and segmentation with provable guarantees in controlled settings, highlighting both theoretical guarantees and practical limitations.
Abstract
We are often interested in decomposing complex, structured data into simple components that explain the data. The linear version of this problem is well-studied as dictionary learning and factor analysis. In this work, we propose a combinatorial model in which to study this question, motivated by the way objects occlude each other in a scene to form an image. First, we identify a property we call "well-structuredness" of a set of low-dimensional components which ensures that no two components in the set are too similar. We show how well-structuredness is sufficient for learning the set of latent components comprising a set of sample instances. We then consider the problem: given a set of components and an instance generated from some unknown subset of them, identify which parts of the instance arise from which components. We consider two variants: (1) determine the minimal number of components required to explain the instance; (2) determine the correct explanation for as many locations as possible. For the latter goal, we also devise a version that is robust to adversarial corruptions, with just a slightly stronger assumption on the components. Finally, we show that the learning problem is computationally infeasible in the absence of any assumptions.
