k-SVD

From testwiki
Revision as of 00:27, 28 May 2024 by imported>Comp.arch
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Template:Short description Template:Machine learning bar In applied mathematics, k-SVD is a dictionary learning algorithm for creating a dictionary for sparse representations, via a singular value decomposition approach. k-SVD is a generalization of the k-means clustering method, and it works by iteratively alternating between sparse coding the input data based on the current dictionary, and updating the atoms in the dictionary to better fit the data. It is structurally related to the expectation–maximization (EM) algorithm.[1][2] k-SVD can be found widely in use in applications such as image processing, audio processing, biology, and document analysis.

k-SVD algorithm

k-SVD is a kind of generalization of k-means, as follows. The k-means clustering can be also regarded as a method of sparse representation. That is, finding the best possible codebook to represent the data samples {yi}i=1M by nearest neighbor, by solving

min\limits D,X{YDXF2}subject to i,xi=ek for some k.

which is nearly equivalent to

min\limits D,X{YDXF2}subject to i,xi0=1

which is k-means that allows "weights".

The letter F denotes the Frobenius norm. The sparse representation term xi=ek enforces k-means algorithm to use only one atom (column) in dictionary D. To relax this constraint, the target of the k-SVD algorithm is to represent the signal as a linear combination of atoms in D.

The k-SVD algorithm follows the construction flow of the k-means algorithm. However, in contrast to k-means, in order to achieve a linear combination of atoms in D, the sparsity term of the constraint is relaxed so that the number of nonzero entries of each column xi can be more than 1, but less than a number T0.

So, the objective function becomes

min\limits D,X{YDXF2}subject to i,xi0T0.

or in another objective form

min\limits D,Xixi0subject to i,YDXF2ϵ.

In the k-SVD algorithm, the D is first fixed and the best coefficient matrix X is found. As finding the truly optimal X is hard, we use an approximation pursuit method. Any algorithm such as OMP, the orthogonal matching pursuit can be used for the calculation of the coefficients, as long as it can supply a solution with a fixed and predetermined number of nonzero entries T0.

After the sparse coding task, the next is to search for a better dictionary D. However, finding the whole dictionary all at a time is impossible, so the process is to update only one column of the dictionary D each time, while fixing X. The update of the k-th column is done by rewriting the penalty term as

YDXF2=Yj=1KdjxjTF2=(YjkdjxjT)dkxkTF2=EkdkxkTF2

where xkT denotes the k-th row of X.

By decomposing the multiplication DX into sum of K rank 1 matrices, we can assume the other K1 terms are assumed fixed, and the k-th remains unknown. After this step, we can solve the minimization problem by approximate the Ek term with a rank1 matrix using singular value decomposition, then update dk with it. However, the new solution for the vector xkT is not guaranteed to be sparse.

To cure this problem, define ωk as

ωk={i1iN,xkT(i)0},

which points to examples {yi}i=1N that use atom dk (also the entries of xi that is nonzero). Then, define Ωk as a matrix of size N×|ωk|, with ones on the (i,ωk(i))th entries and zeros otherwise. When multiplying x~kT=xkTΩk, this shrinks the row vector xkT by discarding the zero entries. Similarly, the multiplication Y~k=YΩk is the subset of the examples that are current using the dk atom. The same effect can be seen on E~k=EkΩk.

So the minimization problem as mentioned before becomes

EkΩkdkxkTΩkF2=E~kdkx~kTF2

and can be done by directly using SVD. SVD decomposes E~k into UΔVT. The solution for dk is the first column of U, the coefficient vector x~kT as the first column of V×Δ(1,1). After updating the whole dictionary, the process then turns to iteratively solve X, then iteratively solve D.

Limitations

Choosing an appropriate "dictionary" for a dataset is a non-convex problem, and k-SVD operates by an iterative update which does not guarantee to find the global optimum.[2] However, this is common to other algorithms for this purpose, and k-SVD works fairly well in practice.[2]Template:Better source

See also

References