A Unified Probabilistic Framework for Dictionary Learning with Parsimonious Activation
Zihui Zhao, Yuanbo Tang, Jieyu Ren, Xiaoping Zhang, Yang Li
TL;DR
This work addresses the limitation of traditional dictionary learning that emphasizes sample-wise sparsity by introducing a parsimony-promoting, row-wise activation regularizer within a Bayesian MAP framework. By deriving the objective min_{D,R} ||X-DR||_F^2 + lambda_1 ||R||_1 + lambda_2 sum_i ||r_i||_inf from Beta-Bernoulli priors, the authors connect hyperparameters to prior distributions and provide principled parameter selection. The approach yields substantial reconstruction improvements (e.g., ~20% RMSE reduction) while activating far fewer dictionary atoms, and the authors establish MDL and pathlet-learning interpretations to ground the method theoretically. This framework offers a principled path toward compact, interpretable dictionaries and opens opportunities to integrate with deep or multi-modal representations while maintaining rigorous sparsity control.
Abstract
Dictionary learning is traditionally formulated as an $L_1$-regularized signal reconstruction problem. While recent developments have incorporated discriminative, hierarchical, or generative structures, most approaches rely on encouraging representation sparsity over individual samples that overlook how atoms are shared across samples, resulting in redundant and sub-optimal dictionaries. We introduce a parsimony promoting regularizer based on the row-wise $L_\infty$ norm of the coefficient matrix. This additional penalty encourages entire rows of the coefficient matrix to vanish, thereby reducing the number of dictionary atoms activated across the dataset. We derive the formulation from a probabilistic model with Beta-Bernoulli priors, which provides a Bayesian interpretation linking the regularization parameters to prior distributions. We further establish theoretical calculation for optimal hyperparameter selection and connect our formulation to both Minimum Description Length, Bayesian model selection and pathlet learning. Extensive experiments on benchmark datasets demonstrate that our method achieves substantially improved reconstruction quality (with a 20\% reduction in RMSE) and enhanced representation sparsity, utilizing fewer than one-tenth of the available dictionary atoms, while empirically validating our theoretical analysis.
