A Model for Combinatorial Dictionary Learning and Inference

Avrim Blum; Kavya Ravichandran

A Model for Combinatorial Dictionary Learning and Inference

Avrim Blum, Kavya Ravichandran

TL;DR

The paper introduces a combinatorial, occlusion-based model for dictionary learning on 1D canvases, defining well-structured object families to enable learnability and segmentation guarantees. It develops a shotgun-sequencing inspired learning approach to recover the dictionary and presents inference algorithms (DP for arbitrary objects and a greedy signature-based method for well-structured objects) that yield correct explanations under various noise and structural assumptions. It analyzes sample complexity, proves correctness under the well-structuredness assumptions, and demonstrates robustness to adversarial corruption with epsilon-strong structures; it also shows that learning without any assumptions is computationally intractable (NP-hard). The framework provides a principled way to study non-linear, combinatorial dictionary learning and segmentation with provable guarantees in controlled settings, highlighting both theoretical guarantees and practical limitations.

Abstract

We are often interested in decomposing complex, structured data into simple components that explain the data. The linear version of this problem is well-studied as dictionary learning and factor analysis. In this work, we propose a combinatorial model in which to study this question, motivated by the way objects occlude each other in a scene to form an image. First, we identify a property we call "well-structuredness" of a set of low-dimensional components which ensures that no two components in the set are too similar. We show how well-structuredness is sufficient for learning the set of latent components comprising a set of sample instances. We then consider the problem: given a set of components and an instance generated from some unknown subset of them, identify which parts of the instance arise from which components. We consider two variants: (1) determine the minimal number of components required to explain the instance; (2) determine the correct explanation for as many locations as possible. For the latter goal, we also devise a version that is robust to adversarial corruptions, with just a slightly stronger assumption on the components. Finally, we show that the learning problem is computationally infeasible in the absence of any assumptions.

A Model for Combinatorial Dictionary Learning and Inference

TL;DR

Abstract

Paper Structure (64 sections, 26 theorems, 29 equations, 3 figures, 5 algorithms)

This paper contains 64 sections, 26 theorems, 29 equations, 3 figures, 5 algorithms.

Introduction
Related Work
Notation, Preliminaries, and Definitions
Notation
Assumptions
Structural Properties and Object Generation
Image Generation
Defining the "scene" and "view"
Formal Statement of the Problem
Learning:
Inference:
Sequencing
Proof of Correctness
Sample Complexity
Learning
...and 49 more sections

Key Result

Lemma 2.0

[Random Objects are $w$-Well-Structured whp] A set of $m$ objects, each sampled uniformly at random from $\{ 0, 1, \hdots\,, c-1 \}^{s_i}\,,$ and $s \coloneqq \max_i s_i$ is $w$-well-structured with probability at least $1 - 3m^2s^2/c^w\,.$ In particular, $w=O(\log m s)$ is sufficient so that the $m

Figures (3)

Figure 1: Object recovery from pieces: Due to objects obscuring each other, we cannot hope to see all $s$ pixels of the orange object at once, but so long as we see the blue pieces shown in this figure, i.e., ones with at least $w$ pixels overlap, we can reconstruct the object.
Figure 2: Process used to learn objects from images in the absence of endpoint markers: This algorithm allows us to collect the pieces of length $4w$ that cover the orange object in a redundant way. However, due to the risk of problematic overlaps we must discard a length-$w$ piece (blue) on either end. Even after this discard, the pieces (green) used for sequencing must cover the object.
Figure 3: Consider the $L$-pixel problematic overlap string boxed in blue. There are 5 different features that uniquely define it. The first is identity of the left object (darker orange), followed by the index within it where the string of interest (boxed in blue) starts. Third, we care about the identity of the right object (lighter orange) and the index within it at which the string of interest ends. Note that these uniquely identify how much overlap there must be, since the length $L$ is fixed. Finally, we include a bit to make clear which object is on top.

Theorems & Definitions (80)

Definition 2.1: $w$-well-structured
Definition 2.2: $\epsilon$-strongly, $w$-well-structured
Lemma 2.0
Definition 2.3: Distinct Background
Definition 2.4: $w$-Well-Structured Background
Definition 2.5: Uniform Model
Definition 2.6: Closed Room Model
Definition 2.7: Open Room Model
Definition 2.8: Fully Random Model: Random Horizontal; Random Depth
Definition 2.9: Partially Random Model: Random Horizontal; Arbitrary Depth
...and 70 more

A Model for Combinatorial Dictionary Learning and Inference

TL;DR

Abstract

A Model for Combinatorial Dictionary Learning and Inference

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (80)