## The Sample Complexity of Dictionary Learning

** Daniel Vainsencher, Shie Mannor, Alfred M. Bruckstein**; 12(100):3259−3281, 2011.

### Abstract

A large set of signals can sometimes be described sparsely using a dictionary, that is, every element can be represented as a linear combination of few elements from the dictionary. Algorithms for various signal processing applications, including classification, denoising and signal separation, learn a dictionary from a given set of signals to be represented. Can we expect that the error in representing by such a dictionary a previously unseen signal from the same source will be of similar magnitude as those for the given examples? We assume signals are generated from a fixed distribution, and study these questions from a statistical learning theory perspective.

We develop generalization bounds on the quality of the learned dictionary for two types of constraints on the coefficient selection, as measured by the expected *L _{2}* error in representation when the dictionary is used. For the case of

*l*regularized coefficient selection we provide a generalization bound of the order of

_{1}*O(√np ln(mλ)/m)*, where

*n*is the dimension,

*p*is the number of elements in the dictionary,

*λ*is a bound on the

*l*norm of the coefficient vector and

_{1}*m*is the number of samples, which complements existing results. For the case of representing a new signal as a combination of at most

*k*dictionary elements, we provide a bound of the order

*O(√np ln(mk)/m)*under an assumption on the closeness to orthogonality of the dictionary (low Babel function). We further show that this assumption holds for

*most*dictionaries in high dimensions in a strong probabilistic sense. Our results also include bounds that converge as

*1/m*, not previously known for this problem. We provide similar results in a general setting using kernels with weak smoothness requirements.

© JMLR 2011. (edit, beta) |