Zipf–Mandelbrot law

From testwiki
Revision as of 23:37, 14 July 2024 by imported>Mikhail Ryazanov (References: fmt.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Template:Short description Template:Probability distribution In probability theory and statistics, the Zipf–Mandelbrot law is a discrete probability distribution. Also known as the Pareto–Zipf law, it is a power-law distribution on ranked data, named after the linguist George Kingsley Zipf, who suggested a simpler distribution called Zipf's law, and the mathematician Benoit Mandelbrot, who subsequently generalized it.

The probability mass function is given by

f(k;N,q,s)=1HN,q,s1(k+q)s,

where HN,q,s is given by

HN,q,s=i=1N1(i+q)s,

which may be thought of as a generalization of a harmonic number. In the formula, k is the rank of the data, and q and s are parameters of the distribution. In the limit as N approaches infinity, this becomes the Hurwitz zeta function ζ(s,q). For finite N and q=0 the Zipf–Mandelbrot law becomes Zipf's law. For infinite N and q=0 it becomes a zeta distribution.

Applications

The distribution of words ranked by their frequency in a random text corpus is approximated by a power-law distribution, known as Zipf's law.

If one plots the frequency rank of words contained in a moderately sized corpus of text data versus the number of occurrences or actual frequencies, one obtains a power-law distribution, with exponent close to one (but see Powers, 1998 and Gelbukh & Sidorov, 2001). Zipf's law implicitly assumes a fixed vocabulary size, but the Harmonic series with s = 1 does not converge, while the Zipf–Mandelbrot generalization with s > 1 does. Furthermore, there is evidence that the closed class of functional words that define a language obeys a Zipf–Mandelbrot distribution with different parameters from the open classes of contentive words that vary by topic, field and register.[1]

In ecological field studies, the relative abundance distribution (i.e. the graph of the number of species observed as a function of their abundance) is often found to conform to a Zipf–Mandelbrot law.[2]

Within music, many metrics of measuring "pleasing" music conform to Zipf–Mandelbrot distributions.[3]

Notes

Template:Reflist

References

Template:ProbDistributions