Affinity propagation

From testwiki
Revision as of 03:39, 8 May 2024 by imported>MtPenguinMonster (#suggestededit-add-desc 1.0)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Template:Short description In statistics and data mining, affinity propagation (AP) is a clustering algorithm based on the concept of "message passing" between data points.[1] Unlike clustering algorithms such as [[K-means clustering|Template:Mvar-means]] or [[K-medoids|Template:Mvar-medoids]], affinity propagation does not require the number of clusters to be determined or estimated before running the algorithm. Similar to Template:Mvar-medoids, affinity propagation finds "exemplars," members of the input set that are representative of clusters.[1]

Algorithm

Let Template:Math through Template:Mvar be a set of data points, with no assumptions made about their internal structure, and let Template:Mvar be a function that quantifies the similarity between any two points, such that Template:Math if and only if Template:Mvar is more similar to Template:Mvar than to Template:Mvar. For this example, the negative squared distance of two data points was used i.e. for points Template:Mvar and Template:Mvar, Template:Nowrap[1]

The diagonal of Template:Mvar (i.e. s(i,i)) is particularly important, as it represents the instance preference, meaning how likely a particular instance is to become an exemplar. When it is set to the same value for all inputs, it controls how many classes the algorithm produces. A value close to the minimum possible similarity produces fewer classes, while a value close to or larger than the maximum possible similarity produces many classes. It is typically initialized to the median similarity of all pairs of inputs.

The algorithm proceeds by alternating between two message-passing steps, which update two matrices:[1]

Both matrices are initialized to all zeroes, and can be viewed as log-probability tables. The algorithm then performs the following updates iteratively:

  • First, responsibility updates are sent around: r(i,k)s(i,k)maxkk{a(i,k)+s(i,k)}
  • Then, availability is updated per a(i,k)min(0,r(k,k)+i∉{i,k}max(0,r(i,k))) for ik and a(k,k)ikmax(0,r(i,k)).

Iterations are performed until either the cluster boundaries remain unchanged over a number of iterations, or some predetermined number (of iterations) is reached. The exemplars are extracted from the final matrices as those whose 'responsibility + availability' for themselves is positive (i.e. (r(i,i)+a(i,i))>0).

Applications

The inventors of affinity propagation showed it is better for certain computer vision and computational biology tasks, e.g. clustering of pictures of human faces and identifying regulated transcripts, than Template:Mvar-means,[1] even when Template:Mvar-means was allowed many random restarts and initialized using PCA.[2] A study comparing affinity propagation and Markov clustering on protein interaction graph partitioning found Markov clustering to work better for that problem.[3] A semi-supervised variant has been proposed for text mining applications.[4] Another recent application was in economics, when the affinity propagation was used to find some temporal patterns in the output multipliers of the US economy between 1997 and 2017.[5]

Software

  • A Java implementation is included in the ELKI data mining framework.
  • A Julia implementation of affinity propagation is contained in Julia Statistics's Clustering.jl package.
  • A Python version is part of the scikit-learn library.
  • An R implementation is available in the "apcluster" package.

References

Template:Reflist