Optimal estimation

From testwiki
Revision as of 05:59, 24 October 2020 by imported>Citation bot (Alter: template type. Add: doi. | You can use this bot yourself. Report bugs here. | Suggested by AManWithNoPlan | All pages linked from cached copy of User:AManWithNoPlan/sandbox3 | via #UCB_webform_linked 357/2623)
(diff) ← Older revision | Latest revision (diff) | Newer revision β†’ (diff)
Jump to navigation Jump to search

In applied statistics, optimal estimation is a regularized matrix inverse method based on Bayes' theorem. It is used very commonly in the geosciences, particularly for atmospheric sounding. A matrix inverse problem looks like this:

𝐀xβ†’=yβ†’

The essential concept is to transform the matrix, A, into a conditional probability and the variables, x→ and y→ into probability distributions by assuming Gaussian statistics and using empirically-determined covariance matrices.

Derivation

Typically, one expects the statistics of most measurements to be Gaussian. So for example for P(y→|x→), we can write:

P(yβ†’|xβ†’)=1(2π)mn/2|π‘Ίπ’š|exp[12(𝑨xβ†’yβ†’)Tπ‘Ίπ’š1(𝑨xβ†’yβ†’)]

where m and n are the numbers of elements in xβ†’ and yβ†’ respectively 𝑨 is the matrix to be solved (the linear or linearised forward model) and π‘Ίπ’š is the covariance matrix of the vector yβ†’. This can be similarly done for xβ†’:

P(xβ†’)=1(2π)m/2|𝑺𝒙𝒂|exp[12(xβ†’xa^)T𝑺𝒙𝒂1(xβ†’xa^)]

Here P(xβ†’) is taken to be the so-called "a-priori" distribution: xa^ denotes the a-priori values for xβ†’ while 𝑺𝒙𝒂 is its covariance matrix.

The nice thing about the Gaussian distributions is that only two parameters are needed to describe them and so the whole problem can be converted once again to matrices. Assuming that P(x→|y→) takes the following form:

P(xβ†’|yβ†’)=1(2π)mn/2|𝑺𝒙|exp[12(xβ†’x^)T𝑺𝒙1(xβ†’x^)]

P(y→) may be neglected since, for a given value of x→, it is simply a constant scaling term. Now it is possible to solve for both the expectation value of x→, x^, and for its covariance matrix by equating P(x→|y→) and P(y→|x→)P(x→). This produces the following equations:

𝑺𝒙=(𝑨TSπ’š1𝑨+S𝒙𝒂1)1
x^=xa^+𝑺𝒙𝑨Tπ‘Ίπ’š1(y→𝑨xa^)

Because we are using Gaussians, the expected value is equivalent to the maximum likely value, and so this is also a form of maximum likelihood estimation.

Typically with optimal estimation, in addition to the vector of retrieved quantities, one extra matrix is returned along with the covariance matrix. This is sometimes called the resolution matrix or the averaging kernel and is calculated as follows:

𝑹=(𝑨Tπ‘Ίπ’š1𝑨+𝑺𝒙𝒂1)1𝑨Tπ‘Ίπ’š1𝑨

This tells us, for a given element of the retrieved vector, how much of the other elements of the vector are mixed in. In the case of a retrieval of profile information, it typical indicates the altitude resolution for a given altitude. For instance if the resolution vectors for all the altitudes contain non-zero elements (to a numerical tolerance) in their four nearest neighbours, then the altitude resolution is only one fourth that of the actual grid size.

References