Innovation method

From testwiki
Jump to navigation Jump to search

Template:Short description

In statistics, the Innovation method provides an estimator for the parameters of stochastic differential equations given a time series of (potentially noisy) observations of the state variables. In the framework of continuous-discrete state space models, the innovation estimator is obtained by maximizing the log-likelihood of the corresponding discrete-time innovation process with respect to the parameters. The innovation estimator can be classified as a M-estimator, a quasi-maximum likelihood estimator or a prediction error estimator depending on the inferential considerations that want to be emphasized. The innovation method is a system identification technique for developing mathematical models of dynamical systems from measured data and for the optimal design of experiments.

Background

Stochastic differential equations (SDEs) have become an important mathematical tool for describing the time evolution of several random phenomenon in natural, social and applied sciences. Statistical inference for SDEs is thus of great importance in applications for model building, model selection, model identification and forecasting. To carry out statistical inference for SDEs, measurements of the state variables of these random phenomena are indispensable. Usually, in practice, only a few state variables are measured by physical devices that introduce random measurement errors (observational errors).

Mathematical model for inference

The innovation estimator.[1] for SDEs is defined in the framework of continuous-discrete state space models.[2] These models arise as natural mathematical representation of the temporal evolution of continuous random phenomena and their measurements in a succession of time instants. In the simplest formulation, these continuous-discrete models [2] are expressed in term of a SDE of the form

d𝐱(t)=𝐟(t,𝐱(t);θ)dt+i=1m𝐠i (t,𝐱(t);θ) d𝐰i(t)(1)

describing the time evolution of d state variables 𝐱 of the phenomenon for all time instant tt0, and an observation equation

𝐳tk=𝐂𝐱(tk)+𝐞tk(2)

describing the time series of measurements 𝐳t0,...,𝐳tM1of at least one of the variables 𝐱 of the random phenomenon on M time instants t0 ,..., tM1. In the model (1)-(2), 𝐟 and 𝐠i are differentiable functions, 𝐰=(𝐰1,...,𝐰m) is an m-dimensional standard Wiener process, θℝp is a vector of p parameters, {𝐞tk:𝐞tkN(0,Πtk)}k=0,...,M1 is a sequence of r-dimensional i.i.d. Gaussian random vectors independent of 𝐰, Πtk an r×r positive definite matrix, and 𝐂 an r×d matrix.

Statistical problem to solve

Once the dynamics of a phenomenon is described by a state equation as (1) and the way of measurement the state variables specified by an observation equation as (2), the inference problem to solve is the following:[1][3] given M partial and noisy observations 𝐳t0,...,𝐳tM1 of the stochastic process 𝐱 on the observation times t0,...,tM1, estimate the unobserved state variable of 𝐱 and the unknown parameters θ in (1) that better fit to the given observations.

Discrete-time innovation process

Let {t}M be the sequence of M observation times t0,,tM1 of the states of (1), and Zρ={𝐳tk:tkρ,tk{t}M} the time series of partial and noisy measurements of 𝐱 described by the observation equation (2).

Further, let 𝐱t/ρ=E(𝐱(t)|Zρ) and 𝐔t/ρ=E(𝐱(t)𝐱(t)|Zρ)𝐱t/ρ𝐱t/ρ be the conditional mean and variance of 𝐱 with ρt, where E() denotes the expected value of random vectors.

The random sequence {νtk}k=1,,M1, with

νtk=𝐳tk𝐂𝐱tk/tk1(θ),(3)

defines the discrete-time innovation process,[4][1][5] where νtk is proved to be an independent normally distributed random vector with zero mean and variance

Σtk=𝐂𝐔tk/tk1(θ) π‚+Πtk,(4)

for small enough Δ=maxk{tk+1tk}, with tk,tk+1{t}M. In practice,[6] this distribution for the discrete-time innovation is valid when, with a suitable selection of both, the number M of observations and the time distance tk+1tk between consecutive observations, the time series of observations 𝐳t0,...,𝐳tM1 of the SDE contains the main information about the continuous-time process 𝐱. That is, when the sampling of the continuous-time process 𝐱 has low distortion (aliasing) and when there is a suitable signal-noise ratio.

Innovation estimator

The innovation estimator for the parameters of the SDE (1) is the one that maximizes the likelihood function of the discrete-time innovation process {νtk}k=1,,M1 with respect to the parameters.[1] More precisely, given M measurements ZtM1of the state space model (1)-(2) with θ=θ0 on {t}M, the innovation estimator for the parameters θ0 of (1) is defined by

θ^M=arg{minθ UM(θ,ZtM1)},(5)

where

UM(θ,ZtM1)=(M1)ln(2π)+k=1M1ln(det(Σtk))+νtkΣtk1νtk,

being νtkthe discrete-time innovation (3) and Σtkthe innovation variance (4) of the model (1)-(2) at tk, for all k=1,...,M1. In the above expression for UM(θ,ZtM1), the conditional mean 𝐱tk/tk1(θ) and variance 𝐔tk/tk1(θ) are computed by the continuous-discrete filtering algorithm for the evolution of the moments (Section 6.4 in[2]), for all k=1,...,M1.

Differences with the maximum likelihood estimator

The maximum likelihood estimator of the parameters θ in the model (1)-(2) involves the evaluation of the - usually unknown - transition density function pθ(tk+1tk,𝐱(tk),𝐱(tk+1)) between the states 𝐱(tk) and 𝐱(tk+1) of the diffusion process 𝐱 for all the observation times tk and tk+1.[7] Instead of this, the innovation estimator (5) is obtained by maximizing the likelihood of the discrete-time innovation process {νtk}k=1,...,M1, taking into account that νt1,...,νtM1 are Gaussian and independent random vectors. Remarkably, whereas the transition density function pθ(tk+1tk,𝐱(tk),𝐱(tk+1)) changes when the SDE for 𝐱 does, the transition density function 𝔭θ(tk+1tk,νtk,νtk+1) for the innovation process remains Gaussian independently of the SDEs for 𝐱. Only in the case that the diffusion 𝐱 is described by a linear SDE with additive noise, the density function pθ(tk+1tk,𝐱(tk),𝐱(tk+1)) is Gaussian and equal to 𝔭θ(tk+1tk,νtk,νtk+1), and so the maximum likelihood and the innovation estimator coincide.[5] Otherwise,[5] the innovation estimator is an approximation to the maximum likelihood estimator and, in this sense, the innovation estimator is a Quasi-Maximum Likelihood estimator. In addition, the innovation method is a particular instance of the Prediction Error method according to the definition given in.[8] Therefore, the asymptotic results obtained in for that general class of estimators are valid for the innovation estimators.[1][9][10] Intuitively, by following the typical control engineering viewpoint, it is expected that the innovation process - viewed as a measure of the prediction errors of the fitted model - be approximately a white noise process when the models fit the data,[11][3] which can be used as a practical tool for designing of models and for optimal experimental design.[6]

Properties

The innovation estimator (5) has a number of important attributes:

 =t1α,Mρ1diag(Var(θ^M))Mp,

where t1α,Mp1 is the t-student distribution with 100(1α)% significance level, and Mp1 degrees of freedom . Here, Var(θ^M)=(I(θ^M))1 denotes the variance of the innovation estimator θ^M, where

 I(θ^M)=k=1M1Ik(θ^M)

is the Fisher Information matrix the innovation estimator θ^M of θ0 and

[Ik(θ^M)]m,n=μθmΣ1μθn+12trace(Σ1ΣθmΣ1Σθn)

is the entry (m,n) of the matrix Ik(θ^M) with μ=𝐂𝐱tk/tk1(θ^M) and Σ=Σtk(θ^M), for 1m,np.

  • For smooth enough function 𝐑, nonlinear observation equations of the form

𝐳tk=𝐑(tk𝐱(tk))+𝐞tk,(6)

can be transformed to the simpler one (2), and the innovation estimator (5) can be applied.[5]

Approximate Innovation estimators

In practice, close form expressions for computing 𝐱tk/tk1(θ) and 𝐔tk/tk1(θ) in (5) are only available for a few models (1)-(2). Therefore, approximate filtering algorithms as the following are used in applications.

Given M measurements ZtM1 and the initial filter estimates 𝐲t0/t0=𝐱t0/t0, 𝐕t0/t0=𝐔t0/t0, the approximate Linear Minimum Variance (LMV) filter for the model (1)-(2) is iteratively defined at each observation time tk{t}M by the prediction estimates[2][13]

𝐲tk+1/tk=E(𝐲(tk+1)|Ztk) and 𝐕tk+1/tk=E(𝐲(tk+1)𝐲(tk+1)|Ztk)𝐲tk+1/tk𝐲tk+1/tk,(7)

with initial conditions 𝐲tk/tk and 𝐕tk/tk, and the filter estimates

𝐲tk+1/tk+1=𝐲tk+1/tk+𝐊tk+1(𝐳tk+1𝐂𝐲tk+1/tk) and 𝐕tk+1/tk+1=𝐕tk+1/tk𝐊tk+1𝐂𝐕tk+1/tk(8)

with filter gain

𝐊tk+1=𝐕tk+1/tk𝐂𝐂𝐕tk+1/tk(𝐂+Πtk+1)1

for all tk,tk+1{t}M, where 𝐲 is an approximation to the solution 𝐱 of (1) on the observation times {t}M.

Given M measurements ZtM1 of the state space model (1)-(2) with θ=θ0 on {t}M, the approximate innovation estimator for the parameters θ0 of (1) is defined by[1][12]

ϑ^M=arg{minθπ’Ÿθ U~M(θ,ZtM1)},(9)

where

U~M(θ,ZtM1)=(M1)ln(2π)+k=1M1ln(det(Σ~tk))+ν~tk(Σ~tk)1ν~tk,

being

ν~tk=𝐳tk𝐂𝐲tk/tk1(θ) and Σ~tk=𝐂𝐕tk/tk1(θ)𝐂+Πtk

approximations to the discrete-time innovation (3) and innovation variance (4), respectively, resulting from the filtering algorithm (7)-(8).

For models with complete observations free of noise (i.e., with 𝐂=𝐈 and Πtk=0 in (2)), the approximate innovation estimator (9) reduces to the known Quasi-Maximum Likelihood estimators for SDEs.[12]

Main conventional-type estimators

Conventional-type innovation estimators are those (9) derived from conventional-type continuous-discrete or discrete-discrete approximate filtering algorithms. With approximate continuous-discrete filters there are the innovation estimators based on Local Linearization (LL) filters,[1][14][5] on the extended Kalman filter,[15][16] and on the second order filters.[3][16] Approximate innovation estimators based on discrete-discrete filters result from the discretization of the SDE (1) by means of a numerical scheme.[17][18] Typically, the effectiveness of these innovation estimators is directly related to the stability of the involved filtering algorithms.

A shared drawback of these conventional-type filters is that, once the observations are given, the error between the approximate and the exact innovation process is fixed and completely settled by the time distance between observations.[12] This might set a large bias of the approximate innovation estimators in some applications, bias that cannot be corrected by increasing the number of observations. However, the conventional-type innovation estimators are useful in many practical situations for which only medium or low accuracy for the parameter estimation is required.[12]

Order-Ξ² innovation estimators

Let us consider the finer time discretization (τ)h>0={τn:τn+1τnh for n=0,1,,N} of the time interval [t0,tM1] satisfying the condition (τ)h{t}M. Further, let 𝐲n be the approximate value of 𝐱(τn) obtained from a discretization of the equation (1) for all (τ)h, and

𝐲={𝐲(t),t[t0,tM1]:𝐲(τn)=𝐲n, for all τn(τ)h}(10)

a continuous-time approximation to 𝐱.

A order-β LMV filter.[13] is an approximate LMV filter for which 𝐲 is an order-β weak approximation to 𝐱 satisfying (10) and the weak convergence condition

suptkttk+1|E(g(𝐱(t))|Ztk)E(g(𝐲(t))|Ztk)|Lkhβ

for all tk,tk+1{t}M and any 2(β+1) times continuously differentiable functions g:ℝdℝ for which g and all its partial derivatives up to order 2(β+1) have polynomial growth, being Lk a positive constant. This order-β LMV filter converges with rate β to the exact LMV filter as h goes to zero,[13] where h is the maximum stepsize of the time discretization (τ)h{t}M on which the approximation 𝐲 to 𝐱 is defined.

A order-β innovation estimator is an approximate innovation estimator (9) for which the approximations to the discrete-time innovation (3) and innovation variance (4), respectively, resulting from an order-β LMV filter.[12]

Approximations 𝐲 of any kind converging to 𝐱 in a weak sense (as, e.g., those in [19][13]) can be used to design an order-β LMV filter and, consequently, an order-β innovation estimator. These order-β innovation estimators are intended for the recurrent practical situation in which a diffusion process should be identified from a reduced number of observations distant in time or when high accuracy for the estimated parameters is required.

Properties

An order-β innovation estimator θ^M(h) has a number of important properties:[12][6]

  • For each given data ZtM1 of M observations, θ^M(h) converges to the exact innovation estimator θ^M as the maximum stepsize h of the time discretization (τ)h{t}M goes to zero.
  • For finite samples of M observations, the expected value of θ^M(h) converges to the expected value of the exact innovation estimator θ^M as h goes to zero.
  • For an increasing number of observations, θ^M(h) is asymptotically normal distributed and its bias decreases when h goes to zero.
  • Likewise to the convergence of the order-β LMV filter to the exact LMV filter, for the convergence and asymptotic properties of θ^M(h) there are no constraints on the time distance tk+1tk between two consecutive observations 𝐳tk and 𝐳tk+1, nor on the time discretization (τ)h{t}M.
  • Approximations for the Akaike or Bayesian information criterion and confidence limits are directly obtained by replacing the exact estimator θ^M by its approximation θ^M(h). These approximations converge to the corresponding exact one when the maximum stepsize h of the time discretization (τ)h{t}M goes to zero.
  • The distribution of the approximate fitting-innovation process {ν~tk:ν~tk=𝐳tk𝐂𝐲tk/tk1(θ^M(h))}k=1,M1 measures the goodness of fit of the model to the data, which is also used as a practical tool for designing of models and for optimal experimental design.
  • For smooth enough function 𝐑, nonlinear observation equations of the form (6) can be transformed to the simpler one (2), and the order-β innovation estimator can be applied.
Fig. 1 Histograms of the differences (α^Mα^h,MD,σ^Mσ^h,MD) and (α^Mα^h,M,σ^Mσ^h,M) between the exact innovation estimator (α^M,σ^M) with the conventional (α^h,MD,σ^h,MD) and order-1 (α^h,M,σ^h,M) innovation estimators for the parameters (α,σ) of the model (11)-(12) given 100 time series of M=10 noisy observations on the time interval [0.5,0.5+M1] with sampling period Δ=1.

Figure 1 presents the histograms of the differences (α^Mα^h,MD,σ^Mσ^h,MD) and (α^Mα^h,M,σ^Mσ^h,M) between the exact innovation estimator (α^M,σ^M) with the conventional (α^h,MD,σ^h,MD) and order-1 (α^h,M,σ^h,M) innovation estimators for the parameters α=0.1 and σ=0.1 of the equation[12]

dx=txdt+σtxdw(11)

obtained from 100 time series zt0,..,ztM1 of M noisy observations

ztk=x(tk)+etk, for k=0,1,..,M1,(12)

of x on the observation times {t}M=10={tk=0.5+kΔ:k=0,M1, Δ=1}, with x(0.5)=1 and Πk=0.0001. The classical and the order-1 Local Linearization filters of the innovation estimators (α^h,MD,σ^h,MD) and (α^h,M,σ^h,M) are defined as in,[12] respectively, on the uniform time discretizations (τ)h=Δ{t}M and (τ)h=Δ/2,Δ/8,Δ/32={τn:τn=0.5+nh, with n=0,1,,(M1)/h}. The number of stochastic simulations of the order-1 Local Linearization filter is estimated via an adaptive sampling algorithm with moderate tolerance. The Figure 1 illustrates the convergence of the order-1 innovation estimator (α^h,M,σ^h,M) to the exact innovation estimators (α^M,σ^M) as h decreases, which substantially improves the estimation provided by the conventional innovation estimator (α^Δ,MD,σ^Δ,MD).

Deterministic approximations

The order-β innovation estimators overcome the drawback of the conventional-type innovation estimators concerning the impossibility of reducing bias.[12] However, the viable bias reduction of an order-β innovation estimators might eventually require that the associated order-β LMV filter performs a large number of stochastic simulations.[13] In situations where only low or medium precision approximate estimators are needed, an alternative deterministic filter algorithm - called deterministic order-β LMV filter [13] - can be obtained by tracking the first two conditional moments μ and Λ of the order-β weak approximation 𝐲 at all the time instants τn(τ)h in between two consecutive observation times tk and tk+1. That is, the value of the predictions 𝐲tk+1/tk and 𝐏tk+1/tk in the filtering algorithm are computed from the recursive formulas

𝐲τn+1/tk=μ(τn,𝐲τn/tk;hn) and 𝐏τn+1/tk=Λ(τn,𝐏τn/tk;hn), with τn,τn+1(τ)h[tk,tk+1],

and with hn=τn+1τn. The approximate innovation estimators θ^h,M defined with these deterministic order-β LMV filters not longer converge to the exact innovation estimator, but allow a significant bias reduction in the estimated parameters for a given finite sample with a lower computational cost.

File:WikiInnFigure.jpg
Fig. 2 Histograms and confidence limits for the innovation estimators (α^h,M,σ^h,M) and (α^,M,σ^,M) of (α,σ) computed with the deterministic order-1 LL filter on uniform (τ)h,M and adaptive (τ),M time discretizations, respectively, from 100 noisy realizations of the Van der Pol model (13)-(15) with sampling period Δ=1 on the time interval [0,M1] and M=30. Observe the bias reduction of the estimated parameter as h decreases.

Figure 2 presents the histograms and the confidence limits of the approximate innovation estimators (α^h,M,σ^h,M) and (α^,M,σ^,M) for the parameters α=1 and σ=1 of the Van der Pol oscillator with random frequency[12]

dx1=x2dt(13)

dx2=((x121)x2αx1)dt+σx1dw(14)

obtained from 100 time series zt0,..,ztM1 of M partial and noisy observations

ztk=x1(tk)+etk, for k=0,1,..,M1,(15)

of x on the observation times {t}M=30={tk=kΔ:k=0,M1, Δ=1}, with (x1(0),x1(0))=(1,1) and Πk=0.001. The deterministic order-1 Local Linearization filter of the innovation estimators (α^h,,M,σ^h,M) and (α^,M,σ^,M) is defined,[12] for each estimator, on uniform time discretizations (τ)h={τn:τn=nh, with n=0,1,,(M1)/h} and on an adaptive time-stepping discretization (τ) with moderate relative and absolute tolerances, respectively. Observe the bias reduction of the estimated parameter as h decreases.

Software

A Matlab implementation of various approximate innovation estimators is provided by the SdeEstimation toolbox.[20] This toolbox has Local Linearization filters, including deterministic and stochastic options with fixed step sizes and sample numbers. It also offers adaptive time stepping and sampling algorithms, along with local and global optimization algorithms for innovation estimation. For models with complete observations free of noise, various approximations to the Quasi-Maximum Likelihood estimator are implemented in R.[21]

References

Template:Reflist

  1. ↑ 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 Template:Citation
  2. ↑ 2.0 2.1 2.2 2.3 Jazwinski A.H., Stochastic Processes and Filtering Theory, Academic Press, New York, 1970.
  3. ↑ 3.0 3.1 3.2 3.3 Template:Cite journal
  4. ↑ Kailath T., Lectures on Wiener and Kalman Filtering. New York: Springer-Verlag, 1981.
  5. ↑ 5.0 5.1 5.2 5.3 5.4 Template:Cite journal
  6. ↑ 6.0 6.1 6.2 6.3 6.4 Template:Cite journal
  7. ↑ Template:Cite journal
  8. ↑ Ljung L., System Identification, Theory for the User (2nd edn). Englewood Cliffs: Prentice Hall, 1999.
  9. ↑ Template:Cite journal
  10. ↑ 10.0 10.1 Nolsoe K., Nielsen, J.N., Madsen H. (2000) "Prediction-based estimating function for diffusion processes with measurement noise", Technical Reports 2000, No. 10, Informatics and Mathematical Modelling, Technical University of Denmark.
  11. ↑ 11.0 11.1 11.2 Template:Cite journal
  12. ↑ 12.00 12.01 12.02 12.03 12.04 12.05 12.06 12.07 12.08 12.09 12.10 12.11 Template:Cite journal
  13. ↑ 13.0 13.1 13.2 13.3 13.4 13.5 Template:Cite journal
  14. ↑ Template:Cite journal
  15. ↑ Template:Cite journal
  16. ↑ 16.0 16.1 Template:Cite journal
  17. ↑ Template:Cite journal
  18. ↑ Template:Cite book
  19. ↑ Kloeden P.E., Platen E., Numerical Solution of Stochastic Differential Equations, 3rd edn. Berlin: Springer, 1999.
  20. ↑ Template:Cite web
  21. ↑ Iacus S.M., Simulation and inference for stochastic differential equations: with R examples, New York: Springer, 2008.