Bernoulli distribution

From testwiki
Revision as of 12:09, 14 February 2025 by imported>Comp.arch
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Template:Short description Template:Use American English Template:Probability distribution Template:Probability fundamentals

In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,[1] is the discrete probability distribution of a random variable which takes the value 1 with probability p and the value 0 with probability q=1p. Less formally, it can be thought of as a model for the set of possible outcomes of any single experiment that asks a yes–no question. Such questions lead to outcomes that are Boolean-valued: a single bit whose value is success/yes/true/one with probability p and failure/no/false/zero with probability q. It can be used to represent a (possibly biased) coin toss where 1 and 0 would represent "heads" and "tails", respectively, and p would be the probability of the coin landing on heads (or vice versa where 1 would represent tails and p would be the probability of tails). In particular, unfair coins would have p1/2.

The Bernoulli distribution is a special case of the binomial distribution where a single trial is conducted (so n would be 1 for such a binomial distribution). It is also a special case of the two-point distribution, for which the possible outcomes need not be 0 and 1.[2]

Properties

If X is a random variable with a Bernoulli distribution, then:

Pr(X=1)=p=1Pr(X=0)=1q.

The probability mass function f of this distribution, over possible outcomes k, is

f(k;p)={pif k=1,q=1pif k=0.[3]

This can also be expressed as

f(k;p)=pk(1p)1kfor k{0,1}

or as

f(k;p)=pk+(1p)(1k)for k{0,1}.

The Bernoulli distribution is a special case of the binomial distribution with n=1.[4]

The kurtosis goes to infinity for high and low values of p, but for p=1/2 the two-point distributions including the Bernoulli distribution have a lower excess kurtosis, namely −2, than any other probability distribution.

The Bernoulli distributions for 0p1 form an exponential family.

The maximum likelihood estimator of p based on a random sample is the sample mean.

The probability mass distribution function of a Bernoulli experiment along with its corresponding cumulative distribution function.

Mean

The expected value of a Bernoulli random variable X is

E[X]=p

This is because for a Bernoulli distributed random variable X with Pr(X=1)=p and Pr(X=0)=q we find

E[X]=Pr(X=1)1+Pr(X=0)0=p1+q0=p.[3]

Variance

The variance of a Bernoulli distributed X is

Var[X]=pq=p(1p)

We first find

E[X2]=Pr(X=1)12+Pr(X=0)02
=p12+q02=p=E[X]

From this follows

Var[X]=E[X2]E[X]2=E[X]E[X]2
=pp2=p(1p)=pq[3]

With this result it is easy to prove that, for any Bernoulli distribution, its variance will have a value inside [0,1/4].

Skewness

The skewness is qppq=12ppq. When we take the standardized Bernoulli distributed random variable XE[X]Var[X] we find that this random variable attains qpq with probability p and attains ppq with probability q. Thus we get

γ1=E[(XE[X]Var[X])3]=p(qpq)3+q(ppq)3=1pq3(pq3qp3)=pqpq3(q2p2)=(1p)2p2pq=12ppq=qppq.

Higher moments and cumulants

The raw moments are all equal because 1k=1 and 0k=0.

E[Xk]=Pr(X=1)1k+Pr(X=0)0k=p1+q0=p=E[X].

The central moment of order k is given by

μk=(1p)(p)k+p(1p)k.

The first six central moments are

μ1=0,μ2=p(1p),μ3=p(1p)(12p),μ4=p(1p)(13p(1p)),μ5=p(1p)(12p)(12p(1p)),μ6=p(1p)(15p(1p)(1p(1p))).

The higher central moments can be expressed more compactly in terms of μ2 and μ3

μ4=μ2(13μ2),μ5=μ3(12μ2),μ6=μ2(15μ2(1μ2)).

The first six cumulants are

κ1=p,κ2=μ2,κ3=μ3,κ4=μ2(16μ2),κ5=μ3(112μ2),κ6=μ2(130μ2(14μ2)).

Entropy and Fisher's Information

Entropy

Entropy is a measure of uncertainty or randomness in a probability distribution. For a Bernoulli random variable X with success probability p and failure probability q=1p, the entropy H(X) is defined as:

H(X)=𝔼pln(1P(X))=[P(X=0)lnP(X=0)+P(X=1)lnP(X=1)]H(X)=(qlnq+plnp),q=P(X=0),p=P(X=1)

The entropy is maximized when p=0.5, indicating the highest level of uncertainty when both outcomes are equally likely. The entropy is zero when p=0 or p=1, where one outcome is certain.

Fisher's Information

Fisher information measures the amount of information that an observable random variable X carries about an unknown parameter p upon which the probability of X depends. For the Bernoulli distribution, the Fisher information with respect to the parameter p is given by:

I(p)=1pq

Proof:

  • The Likelihood Function for a Bernoulli random variableX is:
L(p;X)=pX(1p)1X

This represents the probability of observing X given the parameter p.

  • The Log-Likelihood Function is:
lnL(p;X)=Xlnp+(1X)ln(1p)
  • The Score Function (the first derivative of the log-likelihood w.r.t. p is:
plnL(p;X)=Xp1X1p
  • The second derivative of the log-likelihood function is:
2p2lnL(p;X)=Xp21X(1p)2
  • Fisher information is calculated as the negative expected value of the second derivative of the log-likelihood:
I(p)=E[2p2lnL(p;X)]=(pp21p(1p)2)=1p(1p)=1pq

It is maximized when p=0.5, reflecting maximum uncertainty and thus maximum information about the parameter p.

The Bernoulli distribution is simply B(1,p), also written as Bernoulli(p).

See also

References

Template:Reflist

Further reading

Template:Commons category

Template:ProbDistributions