Misplaced Pages

Innovation method: Difference between revisions

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Browse history interactively← Previous editNext edit →Content deleted Content addedVisualWikitext
Revision as of 01:24, 6 July 2023 editTutwakhamoe (talk | contribs)Autopatrolled, Extended confirmed users9,535 edits Cleaning up accepted Articles for creation submission (AFCH 0.9.1)← Previous edit Revision as of 01:51, 6 July 2023 edit undoTutwakhamoe (talk | contribs)Autopatrolled, Extended confirmed users9,535 edits Converted the citationsTag: Visual editNext edit →
Line 1: Line 1:
{{Short description|System identification}} {{Short description|Statistical estimation method}}


In ], the '''Innovation Method''' provides an ] for the ] of ] given a ] of (]) ] of the ]. In the framework of continuous-discrete state space models, the '''innovation estimator''' is obtained by ] of the corresponding ] innovation process with respect to the parameters. The innovation estimator can be classified as a ], a ] or a ] estimator depending of the inferential considerations that want to be emphasized. The innovation method is a ] technique for developing mathematical models of ] from measured data and for the optimal ]. In ], the '''Innovation Method''' provides an ] for the ] of ] given a ] of (]) ] of the ]. In the framework of continuous-discrete state space models, the '''innovation estimator''' is obtained by ] of the corresponding ] innovation process with respect to the parameters. The innovation estimator can be classified as a ], a ] or a ] estimator depending of the inferential considerations that want to be emphasized. The innovation method is a ] technique for developing mathematical models of ] from measured data and for the optimal ].


==Background== ==Background==
Stochastic differential equations (SDEs) have became an important mathematical tool for describing the ] of several ] ] in ]. ] for Stochastic differential equations (SDEs) have became an important mathematical tool for describing the ] of several ] ] in ]. ] for SDEs is thus of great importance in applications for model building, ], ] and ]. To carry out statistical inference for SDEs, ] of the ] of these random phenomena are indispensable. Usually, in practice, only a few state variables are measured by physical devices that introduce random measurement errors (]).
SDEs is thus of great importance in applications for model building, ], ] and
]. To carry out statistical inference for SDEs, ] of the ] of these random
phenomena are indispensable. Usually, in practice, only a few state variables are measured by physical devices that
introduce random measurement errors (]).


===Mathematical model for inference=== ===Mathematical model for inference===
The innovation estimator.<ref name=":1"> Ozaki T. (1994) "The local linearization filter with application to nonlinear system identification". In: Bozdogan H.(ed.) Proceedings of the first US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach. 217-240. Kluwer Academic Publishers. https://doi.org/10.1007/978-94-011-0854-6_10 </ref> for SDEs is defined in the framework of continuous-discrete state space models <ref name=":2">Jazwinski A.H., Stochastic Processes and Filtering Theory, Academic Press, New York, 1970. </ref>. These models arise as natural mathematical representation of the temporal evolution of continuous random phenomena and their measurements in a succession of time instants. In the simplest formulation, these continuous-discrete models <ref name=":2"></ref> are expressed in term The innovation estimator.<ref name=":1"> {{Citation |last=Ozaki |first=Tohru |title=The Local Linearization Filter with Application to Nonlinear System Identifications |date=1994 |url=https://doi.org/10.1007/978-94-011-0854-6_10 |work=Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach: Volume 3 Engineering and Scientific Applications |pages=217–240 |editor-last=Bozdogan |editor-first=H. |access-date=2023-07-06 |place=Dordrecht |publisher=Springer Netherlands |language=en |doi=10.1007/978-94-011-0854-6_10 |isbn=978-94-011-0854-6 |editor2-last=Sclove |editor2-first=S. L. |editor3-last=Gupta |editor3-first=A. K. |editor4-last=Haughton |editor4-first=D.}} </ref> for SDEs is defined in the framework of continuous-discrete state space models <ref name=":2">Jazwinski A.H., Stochastic Processes and Filtering Theory, Academic Press, New York, 1970. </ref>. These models arise as natural mathematical representation of the temporal evolution of continuous random phenomena and their measurements in a succession of time instants. In the simplest formulation, these continuous-discrete models <ref name=":2"></ref> are expressed in term
of a SDE of the form of a SDE of the form


Line 28: Line 24:


=== Statistical problem to solve === === Statistical problem to solve ===
Once the dynamics of a phenomenon is described by a state equation as (1) and the way of measurement the state variables specified by an observation equation as (2), the inference problem to solve is the following <ref name=":1" /> <ref name=":3"> Nielsen J.N., Vestergaard M., Madsen H. (2000) "Estimation in continuous-time stochastic volatility models using nonlinear filters", Int. J. Theor. Appl. Finance, 3, 279–308. https://doi.org/10.1142/S0219024900000139 </ref> ''':''' given <math>M</math> partial and noisy observations <math>\mathbf{z}_{t_0},...,\mathbf{z}_{t_{M-1}}</math> of the stochastic process <math>\mathbf{x}</math> on the observation times <math>t_0,...,t_{M-1}</math>, estimate the unobserved state variable of <math>\mathbf{x}</math> and the unknown parameters <math>\theta</math> in (1) that better fit to the given observations. Once the dynamics of a phenomenon is described by a state equation as (1) and the way of measurement the state variables specified by an observation equation as (2), the inference problem to solve is the following:<ref name=":1" /><ref name=":3"> {{Cite journal |last=Nielsen |first=Jan Nygaard |last2=Vestergaard |first2=Martin |year=2000 |title=Estimation in continuous-time stochastic volatility models using nonlinear filters |url=https://www.worldscientific.com/doi/abs/10.1142/S0219024900000139 |journal=International Journal of Theoretical and Applied Finance |volume=03 |issue=02 |pages=279–308 |doi=10.1142/S0219024900000139 |issn=0219-0249}} </ref> given <math>M</math> partial and noisy observations <math>\mathbf{z}_{t_0},...,\mathbf{z}_{t_{M-1}}</math> of the stochastic process <math>\mathbf{x}</math> on the observation times <math>t_0,...,t_{M-1}</math>, estimate the unobserved state variable of <math>\mathbf{x}</math> and the unknown parameters <math>\theta</math> in (1) that better fit to the given observations.


== Discrete-time innovation process == == Discrete-time innovation process ==
Line 39: Line 35:
<math>\qquad \qquad \nu_{t_k} = \mathbf{z}_{t_k}- \mathbf{Cx}_{{t_k}/{t_{k-1}}}(\theta), \qquad \qquad (3) </math> <math>\qquad \qquad \nu_{t_k} = \mathbf{z}_{t_k}- \mathbf{Cx}_{{t_k}/{t_{k-1}}}(\theta), \qquad \qquad (3) </math>


defines the discrete-time innovation process <ref name=":12">Kailath T., Lectures on Wiener and Kalman Filtering. New York: Springer-Verlag, 1981.</ref><ref name=":1"></ref><ref name=":5"></ref>, where <math>\nu_{t_k} </math> is proved to be an independent ] random vector with zero mean and variance defines the discrete-time innovation process,<ref name=":12">Kailath T., Lectures on Wiener and Kalman Filtering. New York: Springer-Verlag, 1981.</ref><ref name=":1"></ref><ref name=":5"></ref> where <math>\nu_{t_k} </math> is proved to be an independent ] random vector with zero mean and variance


<math>\qquad \qquad \Sigma_{t_k}= \mathbf{CU}_{{t_k}/t_{k-1}}(\theta)\ \mathbf{C}^\intercal + \Pi_{t_k}, \qquad \qquad (4) </math> <math>\qquad \qquad \Sigma_{t_k}= \mathbf{CU}_{{t_k}/t_{k-1}}(\theta)\ \mathbf{C}^\intercal + \Pi_{t_k}, \qquad \qquad (4) </math>


for small enough <math>\Delta=\underset{k}{\max } \{t_{k+1}-t_k\}</math>, with <math>t_k,t_{k+1} \in \{t\}_{M}</math>. In practice <ref name=":10">Jimenez J.C., Yoshimoto A., Miwakeichi F. (2021) "State and parameter estimation of stochastic physical systems from uncertain and indirect measurements", Eur. Phys. J. Plus, 136, 869. https://doi.org/10.1140/epjp/s13360-021-01859-1</ref>, this distribution for the discrete-time innovation is valid when, with a suitable selection of both, the number <math>M</math> of observations and the time distance <math>t_{k+1}-t_k</math> between consecutive observations, the time series of observations <math>\mathbf{z}_{t_0},...,\mathbf{z}_{t_{M-1}}</math> of the SDE contains the main information about the continuous-time process <math>\mathbf{x}</math>. That is, when the ] of the continuous-time process <math>\mathbf{x}</math> has low distortion (]) and when there is a suitable ]. for small enough <math>\Delta=\underset{k}{\max } \{t_{k+1}-t_k\}</math>, with <math>t_k,t_{k+1} \in \{t\}_{M}</math>. In practice<ref name=":10">{{Cite journal |last=Jimenez |first=J. C. |last2=Yoshimoto |first2=A. |last3=Miwakeichi |first3=F. |date=2021-08-24 |title=State and parameter estimation of stochastic physical systems from uncertain and indirect measurements |url=https://doi.org/10.1140/epjp/s13360-021-01859-1 |journal=The European Physical Journal Plus |language=en |volume= |issue= |pages=136, 869 |doi=10.1140/epjp/s13360-021-01859-1 |issn=2190-5444}}</ref>, this distribution for the discrete-time innovation is valid when, with a suitable selection of both, the number <math>M</math> of observations and the time distance <math>t_{k+1}-t_k</math> between consecutive observations, the time series of observations <math>\mathbf{z}_{t_0},...,\mathbf{z}_{t_{M-1}}</math> of the SDE contains the main information about the continuous-time process <math>\mathbf{x}</math>. That is, when the ] of the continuous-time process <math>\mathbf{x}</math> has low distortion (]) and when there is a suitable ].


== Innovation estimator == == Innovation estimator ==
The innovation estimator for the parameters of the SDE (1) is the one that maximizes the likelihood function of the discrete-time innovation process <math>\{\nu_{t_k}\}_ {k=1,\ldots,M-1}</math> with respect to the parameters <ref name=":1"/>. More precisely, given <math>M</math> measurements <math>Z_{t_{M-1}}</math>of the state space model (1)-(2) with <math>\theta = \theta_0 </math> on <math>\{t\}_M,</math> the '''innovation estimator''' for the parameters <math>\theta_0</math> of (1) is defined by The innovation estimator for the parameters of the SDE (1) is the one that maximizes the likelihood function of the discrete-time innovation process <math>\{\nu_{t_k}\}_ {k=1,\ldots,M-1}</math> with respect to the parameters.<ref name=":1"/> More precisely, given <math>M</math> measurements <math>Z_{t_{M-1}}</math>of the state space model (1)-(2) with <math>\theta = \theta_0 </math> on <math>\{t\}_M,</math> the '''innovation estimator''' for the parameters <math>\theta_0</math> of (1) is defined by


<math>\qquad\qquad <math>\qquad\qquad
Line 61: Line 57:


=== Differences with the maximum likelihood estimator === === Differences with the maximum likelihood estimator ===
The maximum likelihood estimator of the parameters <math>\theta </math> in the model (1)-(2) involves the evaluation of the - usually unknown - ] <math>p_\theta (t_{k+1}- t_k, \mathbf{x}(t_k), \mathbf{x}(t_{k+1}))</math> between the states <math>\mathbf{x}(t_k)</math> and <math> \mathbf{x}(t_{k+1})</math> of the ] <math>\mathbf{x}</math> for all the observation times <math>t_k </math> and <math> t_{k+1}</math> <ref name=":4"> Schweppe F. (1965) "Evaluation of likelihood function for Gaussian signals", IEEE Trans. Inf. Theory, 11, 61-70. https://doi.org/10.1109/TIT.1965.1053737</ref>. instead of this, the innovation estimator (5) is obtained by maximizing the likelihood of the discrete-time innovation process <math>\{ \nu_{t_k} \}_{k=1,...,M-1} , </math> taking into account that <math>\nu _{t_1},...,\nu _{t_{M-1}}</math> are Gaussian and independent random vectors. Remarkably, whereas the transition density function <math> p_{\theta}(t_{k+1}-t_k, \mathbf{x}(t_k),\mathbf{x}(t_{k+1}))</math> changes when the SDE for <math>\mathbf{x}</math> does, the transition density function <math>\mathfrak{p}_\theta(t_{k+1}-t_k, \nu_{t_{k}},\nu_{t_{k+1}} )</math> for the innovation process remains Gaussian independently of the SDEs for <math>\mathbf{x}</math>. Only in the case that the diffusion <math>\mathbf{x}</math> is described by a linear SDE with additive noise, the density function <math>p_\theta(t_{k+1}-t_k, \mathbf{x}(t_k),\mathbf{x}(t_{k+1}))</math> is Gaussian and equal to <math>\mathfrak{p}_\theta(t_{k+1}-t_k, \nu_{t_k}, \nu_{t_{k+1}}),</math> and so the maximum likelihood and the innovation estimator coincide <ref name=":5"> Jimenez J.C., Ozaki T. (2006) "An approximate innovation method for the estimation of diffusion processes from discrete data", J. Time Series Analysis, 27, 77-97. http://dx.doi.org/10.1111/j.1467-9892.2005.00454.x</ref>. Otherwise <ref name=":5" />, the innovation estimator is an approximation to the maximum likelihood estimator and, in this sense, the innovation estimator is a Quasi-Maximum Likelihood estimator. In addition, the innovation method is a particular instance of the Prediction Error method according to the definition given in <ref name=":6">Ljung L., System Identification, Theory for the User (2nd edn). Englewood Cliffs: Prentice Hall, 1999. </ref>. Therefore, the asymptotic results obtained in <ref name=":7"> Ljung L., Caines P.E. (1979) "Asymptotic normality of prediction error estimators for approximate system models", Stochastics 3, 29-46. https://doi.org/10.1080/17442507908833135 </ref> for that general class of estimators are valid for the innovation estimators <ref name=":1"/> <ref name=":8"> Nolsoe K., Nielsen, J.N., Madsen H. (2000) "Prediction-based estimating function for diffusion processes with measurement noise", Technical Reports 2000, No. 10, Informatics and Mathematical Modelling, Technical University of Denmark.</ref>. Intuitively, by following the typical control engineering viewpoint, it is expected that the innovation process - viewed as a measure of the prediction errors of the fitted model - be approximately a white noise process when the models fit the data <ref name=":9">Ozaki T., Jimenez J.C., Haggan V. (2000) "Role of the likelihood function in the estimation of chaos models", J. Time Ser. Anal., 21, 363-387. http://dx.doi.org/10.1111/1467-9892.00189</ref> <ref name=":3"></ref>, which can be used as a practical tool for designing of models and for optimal experimental design <ref name=":10"></ref>. The maximum likelihood estimator of the parameters <math>\theta </math> in the model (1)-(2) involves the evaluation of the - usually unknown - ] <math>p_\theta (t_{k+1}- t_k, \mathbf{x}(t_k), \mathbf{x}(t_{k+1}))</math> between the states <math>\mathbf{x}(t_k)</math> and <math> \mathbf{x}(t_{k+1})</math> of the ] <math>\mathbf{x}</math> for all the observation times <math>t_k </math> and <math> t_{k+1}</math>.<ref name=":4"> {{Cite journal |last=Schweppe |first=F. |date=1965 |title=Evaluation of likelihood functions for Gaussian signals |url=https://ieeexplore.ieee.org/document/1053737/ |journal=IEEE Transactions on Information Theory |volume=11 |issue=1 |pages=61–70 |doi=10.1109/TIT.1965.1053737 |issn=1557-9654}}</ref> Instead of this, the innovation estimator (5) is obtained by maximizing the likelihood of the discrete-time innovation process <math>\{ \nu_{t_k} \}_{k=1,...,M-1} , </math> taking into account that <math>\nu _{t_1},...,\nu _{t_{M-1}}</math> are Gaussian and independent random vectors. Remarkably, whereas the transition density function <math> p_{\theta}(t_{k+1}-t_k, \mathbf{x}(t_k),\mathbf{x}(t_{k+1}))</math> changes when the SDE for <math>\mathbf{x}</math> does, the transition density function <math>\mathfrak{p}_\theta(t_{k+1}-t_k, \nu_{t_{k}},\nu_{t_{k+1}} )</math> for the innovation process remains Gaussian independently of the SDEs for <math>\mathbf{x}</math>. Only in the case that the diffusion <math>\mathbf{x}</math> is described by a linear SDE with additive noise, the density function <math>p_\theta(t_{k+1}-t_k, \mathbf{x}(t_k),\mathbf{x}(t_{k+1}))</math> is Gaussian and equal to <math>\mathfrak{p}_\theta(t_{k+1}-t_k, \nu_{t_k}, \nu_{t_{k+1}}),</math> and so the maximum likelihood and the innovation estimator coincide <ref name=":5"> {{Cite journal |last=Jimenez |first=J. C. |last2=Ozaki |first2=T. |date=2006 |title=An Approximate Innovation Method For The Estimation Of Diffusion Processes From Discrete Data |url=https://onlinelibrary.wiley.com/doi/10.1111/j.1467-9892.2005.00454.x |journal=Journal of Time Series Analysis |language=en |volume=27 |issue=1 |pages=77–97 |doi=10.1111/j.1467-9892.2005.00454.x |issn=0143-9782}}</ref>. Otherwise <ref name=":5" />, the innovation estimator is an approximation to the maximum likelihood estimator and, in this sense, the innovation estimator is a Quasi-Maximum Likelihood estimator. In addition, the innovation method is a particular instance of the Prediction Error method according to the definition given in.<ref name=":6">Ljung L., System Identification, Theory for the User (2nd edn). Englewood Cliffs: Prentice Hall, 1999. </ref> Therefore, the asymptotic results obtained in for that general class of estimators are valid for the innovation estimators.<ref name=":1" /><ref name=":7"> {{Cite journal |last=Lennart |first=Ljung |last2=Caines |first2=Peter E. |date=1980 |title=Asymptotic normality of prediction error estimators for approximate system models |url=http://www.tandfonline.com/doi/abs/10.1080/17442507908833135 |journal=Stochastics |language=en |volume=3 |issue=1-4 |pages=29–46 |doi=10.1080/17442507908833135 |issn=0090-9491}} </ref><ref name=":8"> Nolsoe K., Nielsen, J.N., Madsen H. (2000) "Prediction-based estimating function for diffusion processes with measurement noise", Technical Reports 2000, No. 10, Informatics and Mathematical Modelling, Technical University of Denmark.</ref> Intuitively, by following the typical control engineering viewpoint, it is expected that the innovation process - viewed as a measure of the prediction errors of the fitted model - be approximately a white noise process when the models fit the data,<ref name=":9">{{Cite journal |last=Ozaki |first=T. |last2=Jimenez |first2=J. C. |last3=Haggan-Ozaki |first3=V. |date=2000 |title=The Role of the Likelihood Function in the Estimation of Chaos Models |url=https://onlinelibrary.wiley.com/doi/10.1111/1467-9892.00189 |journal=Journal of Time Series Analysis |language=en |volume=21 |issue=4 |pages=363–387 |doi=10.1111/1467-9892.00189 |issn=0143-9782}}</ref><ref name=":3"></ref> which can be used as a practical tool for designing of models and for optimal experimental design.<ref name=":10"></ref>


=== Properties === === Properties ===
The '''innovation estimator''' (5) has a number of important attributes: The '''innovation estimator''' (5) has a number of important attributes:


* Under conventional regularity conditions, the innovation estimator (5) is ] and ] <ref name=":1"/><ref name=":8" /><ref name=":11"> Jimenez J.C. (2020) "Bias reduction in the estimation of diffusion processes from discrete observations", IMA J. Math. Control. Inform., 37, 1468-1505. https://doi.org/10.1093/imamci/dnaa021</ref>. * Under conventional regularity conditions, the innovation estimator (5) is ] and ].<ref name=":1"/><ref name=":8" /><ref name=":11"> {{Cite web |last=Jimenez |first=J.C. |year=2020 |title=Bias reduction in the estimation of diffusion processes from discrete observations |url=https://academic.oup.com/crawlprevention/governor?content=%2fimamci%2farticle%2f37%2f4%2f1468%2f5903959 |access-date=2023-07-06 |website=academic.oup.com |publisher=IMA J. Math. Control. Inform |pages=1468-1505 |doi=10.1093/imamci/dnaa021}}</ref>


*For ]<ref name=":9"/>, the maximum log-likelihood <math>-U_{M,h} ( \widehat{\theta}_M, Z_{t_{M-1}} )</math> of the innovation estimator (5) can be used to compute the ]. *For ]<ref name=":9"/>, the maximum log-likelihood <math>-U_{M,h} ( \widehat{\theta}_M, Z_{t_{M-1}} )</math> of the innovation estimator (5) can be used to compute the ].
Line 92: Line 88:


* The ] of the fitting-innovation process <math>\{\mathbf{\nu }_{t_{k}}:\mathbf{\nu }_{t_{k}}=\mathbf{z}_{t_{k}}-\mathbf{Cx} * The ] of the fitting-innovation process <math>\{\mathbf{\nu }_{t_{k}}:\mathbf{\nu }_{t_{k}}=\mathbf{z}_{t_{k}}-\mathbf{Cx}
_{t_{k}/t_{k-1}}(\widehat{\theta }_{M})\}_{k=1,\ldots M-1}</math> measures the ] of the model to the data <ref name=":1"></ref><ref name=":3"></ref><ref name=":9"></ref><ref name=":10"></ref>. _{t_{k}/t_{k-1}}(\widehat{\theta }_{M})\}_{k=1,\ldots M-1}</math> measures the ] of the model to the data.<ref name=":1"></ref><ref name=":3"></ref><ref name=":9"></ref><ref name=":10"></ref>


* For smooth enough function <math>\mathbf{h}</math>, nonlinear observation equations of the form * For smooth enough function <math>\mathbf{h}</math>, nonlinear observation equations of the form
Line 100: Line 96:
</math> </math>


can be transformed to the simpler one (2), and the innovation estimator (5) can be applied <ref name=":5"></ref>. can be transformed to the simpler one (2), and the innovation estimator (5) can be applied.<ref name=":5"></ref>


== Approximate Innovation estimators == == Approximate Innovation estimators ==
Line 106: Line 102:
In practice, close form expressions for computing <math>\mathbf{x}_{t_{k}/t_{k-1}}(\theta )</math> and <math>\mathbf{U}_{t_{k}/t_{k-1}}(\theta )</math> in (5) are only available for a few models (1)-(2). Therefore, approximate filtering algorithms as the following are used in applications. In practice, close form expressions for computing <math>\mathbf{x}_{t_{k}/t_{k-1}}(\theta )</math> and <math>\mathbf{U}_{t_{k}/t_{k-1}}(\theta )</math> in (5) are only available for a few models (1)-(2). Therefore, approximate filtering algorithms as the following are used in applications.


Given <math>M</math> measurements <math>Z_{t_{M-1}}</math> and the initial filter estimates <math>\mathbf{y}_{t_{0}/t_{0}}=\mathbf{x}_{t_{0}/t_{0}}</math>, <math> \mathbf{V}_ {t_{0}/t_{0}}=\mathbf{U}_{t_{0}/t_{0}}</math>, the '''approximate Linear Minimum Variance (LMV) filter''' for the model (1)-(2) is iteratively defined at each observation time <math>t_{k}\in \{t\}_{M}</math> by the prediction estimates<ref name=":2"></ref><ref name=":13"> Jimenez J.C. (2019) "Approximate linear minimum variance filters for continuous-discrete state space models: convergence and practical adaptive algorithms", IMA J. Math. Control Inform., 36, 341-378. http://dx.doi.org/10.1093/imamci/dnx047</ref> Given <math>M</math> measurements <math>Z_{t_{M-1}}</math> and the initial filter estimates <math>\mathbf{y}_{t_{0}/t_{0}}=\mathbf{x}_{t_{0}/t_{0}}</math>, <math> \mathbf{V}_ {t_{0}/t_{0}}=\mathbf{U}_{t_{0}/t_{0}}</math>, the '''approximate Linear Minimum Variance (LMV) filter''' for the model (1)-(2) is iteratively defined at each observation time <math>t_{k}\in \{t\}_{M}</math> by the prediction estimates<ref name=":2"></ref><ref name=":13"> {{Cite web |last=Jimenez |first=J.C. |year=2019 |title=Approximate linear minimum variance filters for continuous-discrete state space models: convergence and practical adaptive algorithms |url=https://academic.oup.com/imamci/article/36/2/341/4634018 |access-date=2023-07-06 |website=academic.oup.com |pages=341-378 |doi=10.1093/imamci/dnx047}}</ref>


<math> \qquad \qquad <math> \qquad \qquad
Line 165: Line 161:


Conventional-type innovation estimators are those (9) derived from conventional-type continuous-discrete or Conventional-type innovation estimators are those (9) derived from conventional-type continuous-discrete or
discrete-discrete approximate filtering algorithms. With approximate continuous-discrete filters there are the innovation estimators based on Local Linearization (LL) filters <ref name=":1"></ref><ref name=":14"> Shoji I. (1998) "A comparative study of maximum likelihood estimators for nonlinear dynamical systems", Int. J. Control, 71, 391-404. https://doi.org/10.1080/002071798221731 </ref><ref name=":5"></ref>, on the ] <ref name=":15"> Nielsen, J. N., Madsen, H. (2001) "Applying the EKF to stochastic differential equations with level effects", Automatica, 37, 107-112. https://doi.org/10.1016/S0005-1098(00)00128-X</ref><ref name=":16"> Singer H. (2002) "Parameter estimation of nonlinear stochastic differential equations: Simulated maximum likelihood versus extended Kalman filter and Ito-Taylor expansion", J. Comput. Graph. Stat., 11, 972-995. https://doi.org/10.1198/106186002808 </ref>, and on the ] <ref name=":3"></ref><ref name=":16"></ref>. Approximate innovation estimators based on discrete-discrete filters result from the discretization of the SDE (1) by means of a numerical scheme <ref name=":17"> Ozaki T., Iino M. (2001) "An innovation approach to non-Gaussian time series analysis", J. Appl. Prob., 38A, 78-92. https://doi.org/10.1239/jap/1085496593</ref> <ref name=":18"> Peng H., Ozaki T., Jimenez J.C. (2002) "Modeling and control for foreign exchange based on a continuous time stochastic microstructure model", Proceedings of the 41st IEEE Conference on Decision and Control, LasVegas, Nevada USA, December 2002 IEEE, 4, 4440-4445. http://dx.doi.org/10.1109/CDC.2002.1185071</ref>. Typically, the effectiveness of these innovation estimators is directly related to the ] of the involved filtering algorithms. discrete-discrete approximate filtering algorithms. With approximate continuous-discrete filters there are the innovation estimators based on Local Linearization (LL) filters,<ref name=":1"></ref><ref name=":14"> {{Cite journal |last=Shoji |first=Isao |date=1998 |title=A comparative study of maximum likelihood estimators for nonlinear dynamical system models |url=https://www.tandfonline.com/doi/full/10.1080/002071798221731 |journal=International Journal of Control |language=en |volume=71 |issue=3 |pages=391–404 |doi=10.1080/002071798221731 |issn=0020-7179}} </ref><ref name=":5"></ref> on the],<ref name=":15"> {{Cite journal |last=Nielsen |first=Jan Nygaard |last2=Madsen |first2=Henrik |date=2001-01-01 |title=Applying the EKF to stochastic differential equations with level effects |url=https://www.sciencedirect.com/science/article/pii/S000510980000128X |journal=Automatica |language=en |volume=37 |issue=1 |pages=107–112 |doi=10.1016/S0005-1098(00)00128-X |issn=0005-1098}}</ref><ref name=":16"> {{Cite journal |last=Singer |first=Hermann |date=2002 |title=Parameter Estimation of Nonlinear Stochastic Differential Equations: Simulated Maximum Likelihood versus Extended Kalman Filter and Itô-Taylor Expansion |url=http://www.tandfonline.com/doi/abs/10.1198/106186002808 |journal=Journal of Computational and Graphical Statistics |language=en |volume=11 |issue=4 |pages=972–995 |doi=10.1198/106186002808 |issn=1061-8600}} </ref> and on the ].<ref name=":3"></ref><ref name=":16"></ref> Approximate innovation estimators based on discrete-discrete filters result from the discretization of the SDE (1) by means of a numerical scheme.<ref name=":17"> {{Cite journal |last=Ozaki |first=Tohru |last2=Iino |first2=Mitsunori |date=2001 |title=An innovation approach to non-Gaussian time series analysis |url=https://www.cambridge.org/core/journals/journal-of-applied-probability/article/abs/an-innovation-approach-to-nongaussian-time-series-analysis/B22EEB7243BC878CEB6DA367B0EEE7F5 |journal=Journal of Applied Probability |language=en |volume=38 |issue=A |pages=78–92 |doi=10.1239/jap/1085496593 |issn=0021-9002}}</ref><ref name=":18"> {{Cite journal |last=Peng |first=H. |last2=Ozaki |first2=T. |last3=Jimenez |first3=J.C. |date=2002 |title=Modeling and control for foreign exchange based on a continuous time stochastic microstructure model |url=https://ieeexplore.ieee.org/document/1185071/ |journal=Proceedings of the 41st IEEE Conference on Decision and Control, 2002. |volume=4 |pages=4440–4445 vol.4 |doi=10.1109/CDC.2002.1185071}}</ref> Typically, the effectiveness of these innovation estimators is directly related to the ] of the involved filtering algorithms.


A shared drawback of these conventional-type filters is that, once the observations are given, the error between the approximate and the exact innovation process is fixed and completely settled by the time distance between observations <ref name=":11"></ref>. This might set a large ] of the approximate innovation estimators in some applications, bias that cannot be corrected by increasing the number of observations. However, the conventional-type innovation estimators are useful in many practical situations for which only medium or low ] for the parameter estimation is required <ref name=":11"></ref> A shared drawback of these conventional-type filters is that, once the observations are given, the error between the approximate and the exact innovation process is fixed and completely settled by the time distance between observations.<ref name=":11"></ref> This might set a large ] of the approximate innovation estimators in some applications, bias that cannot be corrected by increasing the number of observations. However, the conventional-type innovation estimators are useful in many practical situations for which only medium or low ] for the parameter estimation is required.<ref name=":11"></ref>


=== Order-β innovation estimators === === Order-β innovation estimators ===
Line 190: Line 186:
</math> </math>


for all <math>t_{k},t_{k+1}\in \{t\}_{M}</math> and any <math>2(\beta +1)</math> times continuously differentiable functions <math>g:\mathbb{R}^{d}\rightarrow \mathbb{R}</math> for which <math>g</math> and all its partial derivatives up to order <math>2(\beta +1)</math> have polynomial growth, being <math>L_{k}</math> a positive constant. This order-<math>\beta </math> LMV filter converges with rate <math>\beta </math> to the exact LMV filter as <math>h</math> goes to zero <ref name=":13"></ref>, where <math>h</math> is the maximum stepsize of the time for all <math>t_{k},t_{k+1}\in \{t\}_{M}</math> and any <math>2(\beta +1)</math> times continuously differentiable functions <math>g:\mathbb{R}^{d}\rightarrow \mathbb{R}</math> for which <math>g</math> and all its partial derivatives up to order <math>2(\beta +1)</math> have polynomial growth, being <math>L_{k}</math> a positive constant. This order-<math>\beta </math> LMV filter converges with rate <math>\beta </math> to the exact LMV filter as <math>h</math> goes to zero,<ref name=":13"></ref> where <math>h</math> is the maximum stepsize of the time
discretization <math>(\tau )_{h}\supset \{t\}_{M}</math> on which the approximation <math>\mathbf{y}</math> to <math>\mathbf{x}</math> is defined. discretization <math>(\tau )_{h}\supset \{t\}_{M}</math> on which the approximation <math>\mathbf{y}</math> to <math>\mathbf{x}</math> is defined.


A '''order-'''<math>\beta </math> '''innovation estimator''' <ref name=":11"></ref> is an approximate innovation estimator (9) for which the approximations to the discrete-time innovation (3) and innovation variance (4), respectively, resulting from an order-<math>\beta </math> LMV filter. A '''order-'''<math>\beta </math> '''innovation estimator''' is an approximate innovation estimator (9) for which the approximations to the discrete-time innovation (3) and innovation variance (4), respectively, resulting from an order-<math>\beta </math> LMV filter.<ref name=":11" />


Approximations <math>\mathbf{y}</math> of any kind converging to <math>\mathbf{x}</math> in a weak sense (as, e.g., those in <ref name=":19"> Kloeden P.E., Platen E., Numerical Solution of Stochastic Differential Equations, 3rd edn. Berlin: Springer, 1999. </ref><ref name=":13"></ref>) can be used to design an order-<math>\beta </math> LMV filter and, consequently, an order-<math>\beta </math> innovation estimator. These order-<math>\beta </math> innovation estimators are intended for the recurrent practical situation in which a diffusion process should be identified from a reduced number of observations distant in time or when high accuracy for the estimated parameters is required. Approximations <math>\mathbf{y}</math> of any kind converging to <math>\mathbf{x}</math> in a weak sense (as, e.g., those in <ref name=":19"> Kloeden P.E., Platen E., Numerical Solution of Stochastic Differential Equations, 3rd edn. Berlin: Springer, 1999. </ref><ref name=":13"></ref>) can be used to design an order-<math>\beta </math> LMV filter and, consequently, an order-<math>\beta </math> innovation estimator. These order-<math>\beta </math> innovation estimators are intended for the recurrent practical situation in which a diffusion process should be identified from a reduced number of observations distant in time or when high accuracy for the estimated parameters is required.
Line 199: Line 195:
==== Properties ==== ==== Properties ====


An '''order-'''<math>\beta </math> '''innovation estimator''' <math>\widehat{\mathbf{\theta }}_{M}(h)</math> has a number of important properties <ref name=":11"></ref><ref name=":10"></ref>: An '''order-'''<math>\beta </math> '''innovation estimator''' <math>\widehat{\mathbf{\theta }}_{M}(h)</math> has a number of important properties:<ref name=":11"></ref><ref name=":10"></ref>


* For each given data <math>Z_{t_{M-1}}</math> of <math>M</math> observations, <math>\widehat{\mathbf{\theta }}_{M}(h)</math> converges to the exact innovation estimator <math>\widehat{\mathbf{\theta }}_{M}</math> as the maximum stepsize <math>h</math> of the time discretization <math>\left( \tau \right) _{h}\supset \{t\}_{M}</math> goes to zero. * For each given data <math>Z_{t_{M-1}}</math> of <math>M</math> observations, <math>\widehat{\mathbf{\theta }}_{M}(h)</math> converges to the exact innovation estimator <math>\widehat{\mathbf{\theta }}_{M}</math> as the maximum stepsize <math>h</math> of the time discretization <math>\left( \tau \right) _{h}\supset \{t\}_{M}</math> goes to zero.
Line 235: Line 231:
==== Deterministic approximations ==== ==== Deterministic approximations ====


The order-<math>\beta </math> innovation estimators overcome the drawback of the conventional-type innovation estimators concerning the impossibility of reducing bias <ref name=":11"></ref>. However, the viable bias reduction of an order-<math>\beta </math> innovation estimators might eventually require that the associated order-<math>\beta </math> LMV filter performs a large number of stochastic simulations <ref name=":13"></ref>. In situations where only low or medium precision approximate estimators are needed, an alternative deterministic filter algorithm - called deterministic order-<math> \beta </math> LMV filter <ref name=":13"></ref> - can be obtained by tracking the first two conditional moments <math>\mu</math> and <math>\Lambda</math> of the order-<math>\beta </math> weak approximation <math>\mathbf{y} </math> at all the time instants <math>\tau _{n}\in \left( \tau \right) _{h}</math> in between two consecutive observation times <math>t_{k}</math> and <math>t_{k+1}</math>. That is, the value of the predictions <math>\mathbf{y}_{t_{k+1}/t_{k}}</math> and <math>\mathbf{P}_{t_{k+1}/t_{k}}</math> in the filtering algorithm are computed from the recursive formulas The order-<math>\beta </math> innovation estimators overcome the drawback of the conventional-type innovation estimators concerning the impossibility of reducing bias.<ref name=":11"></ref> However, the viable bias reduction of an order-<math>\beta </math> innovation estimators might eventually require that the associated order-<math>\beta </math> LMV filter performs a large number of stochastic simulations.<ref name=":13"></ref> In situations where only low or medium precision approximate estimators are needed, an alternative deterministic filter algorithm - called deterministic order-<math> \beta </math> LMV filter <ref name=":13"></ref> - can be obtained by tracking the first two conditional moments <math>\mu</math> and <math>\Lambda</math> of the order-<math>\beta </math> weak approximation <math>\mathbf{y} </math> at all the time instants <math>\tau _{n}\in \left( \tau \right) _{h}</math> in between two consecutive observation times <math>t_{k}</math> and <math>t_{k+1}</math>. That is, the value of the predictions <math>\mathbf{y}_{t_{k+1}/t_{k}}</math> and <math>\mathbf{P}_{t_{k+1}/t_{k}}</math> in the filtering algorithm are computed from the recursive formulas


<math> \qquad \qquad <math> \qquad \qquad
Line 267: Line 263:
== Software == == Software ==


A ] implementation of various approximate innovation estimators is provided by the '''SdeEstimation toolbox''' <ref name=":21"> SdeEstimation toolbox https://github.com/locallinearization/SdeEstimation</ref>. This toolbox contains various implementations of Local Linearization filters for the state estimation and, consequently, of the Innovation Estimators for the parameters. This includes deterministic and stochastic filters with fixed step sizes and number of samples, with adaptive time stepping algorithms, with adaptive sampling algorithms, as well as local and global optimization algorithms for computing the innovation estimators. For models with complete observations free of noise, various approximations to the Quasi-Maximum Likelihood estimator are implemented in ] <ref name=":20">Iacus S.M., Simulation and inference for stochastic differential equations: with R examples, New York: Springer, 2008. </ref> A ] implementation of various approximate innovation estimators is provided by the '''SdeEstimation toolbox'''.<ref name=":21"> {{Cite web |title=GitHub - locallinearization/SdeEstimation |url=https://github.com/locallinearization/SdeEstimation |access-date=2023-07-06 |website=GitHub |language=en}}</ref> This toolbox has Local Linearization filters, including deterministic and stochastic options with fixed step sizes and sample numbers. It also offers adaptive time stepping and sampling algorithms, along with local and global optimization algorithms for innovation estimation. For models with complete observations free of noise, various approximations to the Quasi-Maximum Likelihood estimator are implemented in ].<ref name=":20">Iacus S.M., Simulation and inference for stochastic differential equations: with R examples, New York: Springer, 2008. </ref>


== References== == References==

Revision as of 01:51, 6 July 2023

Statistical estimation method

In statistics, the Innovation Method provides an estimator for the parameters of stochastic differential equations given a time series of (potentially noisy) observations of the state variables. In the framework of continuous-discrete state space models, the innovation estimator is obtained by maximizing the log-likelihood of the corresponding discrete-time innovation process with respect to the parameters. The innovation estimator can be classified as a M-estimator, a quasi-maximum likelihood estimator or a prediction error estimator depending of the inferential considerations that want to be emphasized. The innovation method is a system identification technique for developing mathematical models of dynamical systems from measured data and for the optimal design of experiments.

Background

Stochastic differential equations (SDEs) have became an important mathematical tool for describing the time evolution of several random phenomenon in natural, social and applied sciences. Statistical inference for SDEs is thus of great importance in applications for model building, model selection, model identification and forecasting. To carry out statistical inference for SDEs, measurements of the state variables of these random phenomena are indispensable. Usually, in practice, only a few state variables are measured by physical devices that introduce random measurement errors (observational errors).

Mathematical model for inference

The innovation estimator. for SDEs is defined in the framework of continuous-discrete state space models . These models arise as natural mathematical representation of the temporal evolution of continuous random phenomena and their measurements in a succession of time instants. In the simplest formulation, these continuous-discrete models are expressed in term of a SDE of the form

d x ( t ) = f ( t , x ( t ) ; θ ) d t + i = 1 m g i   ( t , x ( t ) ; θ )   d w i ( t ) ( 1 ) {\displaystyle \qquad \qquad d\mathbf {x} (t)=\mathbf {f} (t,\mathbf {x} (t);\theta )dt+\sum _{i=1}^{m}\mathbf {g} _{i}\ (t,\mathbf {x} (t);\theta )\ d\mathbf {w} ^{i}(t)\qquad \qquad (1)}

describing the time evolution of d {\displaystyle d} state variables x {\displaystyle \mathbf {x} } of the phenomenon for all time instant t t 0 {\displaystyle t\geq t_{0}} , and an observation equation

z t k = C x ( t k ) + e t k ( 2 ) {\displaystyle \qquad \qquad \mathbf {z} _{t_{k}}=\mathbf {Cx} (t_{k})+\mathbf {e} _{t_{k}}\qquad \qquad (2)}

describing the time series of measurements z t 0 , . . . , z t M 1 {\displaystyle \mathbf {z} _{t_{0}},...,\mathbf {z} _{t_{M-1}}} of at least one of the variables x {\displaystyle \mathbf {x} } of the random phenomenon on M {\displaystyle M} time instants t 0   , . . . ,   t M 1 {\displaystyle t_{0}\ ,...,\ t_{M-1}} . In the model (1)-(2), f {\displaystyle \mathbf {f} } and g i {\displaystyle \mathbf {g} _{i}} are differentiable functions, w = ( w 1 , . . . , w m ) {\displaystyle \mathbf {w} =(\mathbf {w} ^{1},...,\mathbf {w} ^{m})} is an m {\displaystyle m} -dimensional standard Wiener process, θ R p {\displaystyle \theta \in \mathbb {R} ^{p}} is a vector of p {\displaystyle p} parameters, { e t k : e t k N ( 0 , Π t k ) } k = 0 , . . . , M 1 {\displaystyle \{\mathbf {e} _{t_{k}}:\mathbf {e} _{t_{k}}\sim \mathrm {N} (0,\Pi _{t_{k}})\}_{k=0,...,M-1}} is a sequence of r {\displaystyle r} -dimensional i.i.d. Gaussian random vectors independent of w {\displaystyle \mathbf {w} } , Π t k {\displaystyle \Pi _{t_{k}}} an r × r {\displaystyle r\times r} positive definite matrix, and C {\displaystyle \mathbf {C} } an r × d {\displaystyle r\times d} matrix.

Statistical problem to solve

Once the dynamics of a phenomenon is described by a state equation as (1) and the way of measurement the state variables specified by an observation equation as (2), the inference problem to solve is the following: given M {\displaystyle M} partial and noisy observations z t 0 , . . . , z t M 1 {\displaystyle \mathbf {z} _{t_{0}},...,\mathbf {z} _{t_{M-1}}} of the stochastic process x {\displaystyle \mathbf {x} } on the observation times t 0 , . . . , t M 1 {\displaystyle t_{0},...,t_{M-1}} , estimate the unobserved state variable of x {\displaystyle \mathbf {x} } and the unknown parameters θ {\displaystyle \theta } in (1) that better fit to the given observations.

Discrete-time innovation process

Let { t } M {\displaystyle \{t\}_{M}} be the sequence of M {\displaystyle M} observation times t 0 , , t M 1 {\displaystyle t_{0},\ldots ,t_{M-1}} of the states of (1), and Z ρ = { z t k : t k ρ , t k { t } M } {\displaystyle Z_{\rho }=\{\mathbf {z} _{t_{k}}:t_{k}\leq \rho ,t_{k}\in \{t\}_{M}\}} the time series of partial and noisy measurements of x {\displaystyle \mathbf {x} } described by the observation equation (2).

Further, let x t / ρ = E ( x ( t ) | Z ρ ) {\displaystyle \mathbf {x} _{t/\rho }=\mathrm {E} (\mathbf {x} (t)|Z_{\rho })} and U t / ρ = E ( x ( t ) x ( t ) | Z ρ ) x t / ρ x t / ρ {\displaystyle \mathbf {U} _{t/\rho }=E(\mathbf {x} (t)\mathbf {x} ^{\intercal }(t)|Z_{\rho })-\mathbf {x} _{t/\rho }\mathbf {x} _{t/\rho }^{\intercal }} be the conditional mean and variance of x {\displaystyle \mathbf {x} } with ρ t {\displaystyle \rho \leq t} , where E ( ) {\displaystyle E(\cdot )} denotes the expected value of random vectors.

The random sequence { ν t k } k = 1 , , M 1 , {\displaystyle \{\nu _{t_{k}}\}_{k=1,\ldots ,M-1},} with

ν t k = z t k C x t k / t k 1 ( θ ) , ( 3 ) {\displaystyle \qquad \qquad \nu _{t_{k}}=\mathbf {z} _{t_{k}}-\mathbf {Cx} _{{t_{k}}/{t_{k-1}}}(\theta ),\qquad \qquad (3)}

defines the discrete-time innovation process, where ν t k {\displaystyle \nu _{t_{k}}} is proved to be an independent normally distributed random vector with zero mean and variance

Σ t k = C U t k / t k 1 ( θ )   C + Π t k , ( 4 ) {\displaystyle \qquad \qquad \Sigma _{t_{k}}=\mathbf {CU} _{{t_{k}}/t_{k-1}}(\theta )\ \mathbf {C} ^{\intercal }+\Pi _{t_{k}},\qquad \qquad (4)}

for small enough Δ = max k { t k + 1 t k } {\displaystyle \Delta ={\underset {k}{\max }}\{t_{k+1}-t_{k}\}} , with t k , t k + 1 { t } M {\displaystyle t_{k},t_{k+1}\in \{t\}_{M}} . In practice, this distribution for the discrete-time innovation is valid when, with a suitable selection of both, the number M {\displaystyle M} of observations and the time distance t k + 1 t k {\displaystyle t_{k+1}-t_{k}} between consecutive observations, the time series of observations z t 0 , . . . , z t M 1 {\displaystyle \mathbf {z} _{t_{0}},...,\mathbf {z} _{t_{M-1}}} of the SDE contains the main information about the continuous-time process x {\displaystyle \mathbf {x} } . That is, when the sampling of the continuous-time process x {\displaystyle \mathbf {x} } has low distortion (aliasing) and when there is a suitable signal-noise ratio.

Innovation estimator

The innovation estimator for the parameters of the SDE (1) is the one that maximizes the likelihood function of the discrete-time innovation process { ν t k } k = 1 , , M 1 {\displaystyle \{\nu _{t_{k}}\}_{k=1,\ldots ,M-1}} with respect to the parameters. More precisely, given M {\displaystyle M} measurements Z t M 1 {\displaystyle Z_{t_{M-1}}} of the state space model (1)-(2) with θ = θ 0 {\displaystyle \theta =\theta _{0}} on { t } M , {\displaystyle \{t\}_{M},} the innovation estimator for the parameters θ 0 {\displaystyle \theta _{0}} of (1) is defined by

θ ^ M = arg { min θ   U M ( θ , Z t M 1 ) } , ( 5 ) {\displaystyle \qquad \qquad {\hat {\theta }}_{M}=\operatorname {\arg\{} {\underset {\theta }{\min }}\ U_{M}(\theta ,Z_{t_{M-1}})\},\qquad \qquad (5)}

where

U M ( θ , Z t M 1 ) = ( M 1 ) ln ( 2 π ) + k = 1 M 1 ln ( det ( Σ t k ) ) + ν t k Σ t k 1 ν t k , {\displaystyle \qquad \qquad U_{M}(\theta ,Z_{t_{M-1}})=(M-1)\ln(2\pi )+\sum _{k=1}^{M-1}\ln(\det(\Sigma _{t_{k}}))+\nu _{t_{k}}^{\intercal }\Sigma _{t_{k}}^{-1}\nu _{t_{k}},}

being ν t k {\displaystyle \nu _{t{_{k}}}} the discrete-time innovation (3) and Σ t k {\displaystyle \Sigma _{t_{k}}} the innovation variance (4) of the model (1)-(2) at t k {\displaystyle t_{k}} , for all k = 1 , . . . , M 1. {\displaystyle k=1,...,M-1.} In the above expression for U M ( θ , Z t M 1 ) , {\displaystyle U_{M}(\theta ,Z_{t_{M-1}}),} the conditional mean x t k / t k 1 ( θ ) {\displaystyle \mathbf {x} _{t_{k}/t_{k-1}}(\theta )} and variance U t k / t k 1 ( θ ) {\displaystyle \mathbf {U} _{t_{k}/t_{k-1}}(\theta )} are computed by the continuous-discrete filtering algorithm for the evolution of the moments (Section 6.4 in), for all k = 1 , . . . , M 1. {\displaystyle k=1,...,M-1.}

Differences with the maximum likelihood estimator

The maximum likelihood estimator of the parameters θ {\displaystyle \theta } in the model (1)-(2) involves the evaluation of the - usually unknown - transition density function p θ ( t k + 1 t k , x ( t k ) , x ( t k + 1 ) ) {\displaystyle p_{\theta }(t_{k+1}-t_{k},\mathbf {x} (t_{k}),\mathbf {x} (t_{k+1}))} between the states x ( t k ) {\displaystyle \mathbf {x} (t_{k})} and x ( t k + 1 ) {\displaystyle \mathbf {x} (t_{k+1})} of the diffusion process x {\displaystyle \mathbf {x} } for all the observation times t k {\displaystyle t_{k}} and t k + 1 {\displaystyle t_{k+1}} . Instead of this, the innovation estimator (5) is obtained by maximizing the likelihood of the discrete-time innovation process { ν t k } k = 1 , . . . , M 1 , {\displaystyle \{\nu _{t_{k}}\}_{k=1,...,M-1},} taking into account that ν t 1 , . . . , ν t M 1 {\displaystyle \nu _{t_{1}},...,\nu _{t_{M-1}}} are Gaussian and independent random vectors. Remarkably, whereas the transition density function p θ ( t k + 1 t k , x ( t k ) , x ( t k + 1 ) ) {\displaystyle p_{\theta }(t_{k+1}-t_{k},\mathbf {x} (t_{k}),\mathbf {x} (t_{k+1}))} changes when the SDE for x {\displaystyle \mathbf {x} } does, the transition density function p θ ( t k + 1 t k , ν t k , ν t k + 1 ) {\displaystyle {\mathfrak {p}}_{\theta }(t_{k+1}-t_{k},\nu _{t_{k}},\nu _{t_{k+1}})} for the innovation process remains Gaussian independently of the SDEs for x {\displaystyle \mathbf {x} } . Only in the case that the diffusion x {\displaystyle \mathbf {x} } is described by a linear SDE with additive noise, the density function p θ ( t k + 1 t k , x ( t k ) , x ( t k + 1 ) ) {\displaystyle p_{\theta }(t_{k+1}-t_{k},\mathbf {x} (t_{k}),\mathbf {x} (t_{k+1}))} is Gaussian and equal to p θ ( t k + 1 t k , ν t k , ν t k + 1 ) , {\displaystyle {\mathfrak {p}}_{\theta }(t_{k+1}-t_{k},\nu _{t_{k}},\nu _{t_{k+1}}),} and so the maximum likelihood and the innovation estimator coincide . Otherwise , the innovation estimator is an approximation to the maximum likelihood estimator and, in this sense, the innovation estimator is a Quasi-Maximum Likelihood estimator. In addition, the innovation method is a particular instance of the Prediction Error method according to the definition given in. Therefore, the asymptotic results obtained in for that general class of estimators are valid for the innovation estimators. Intuitively, by following the typical control engineering viewpoint, it is expected that the innovation process - viewed as a measure of the prediction errors of the fitted model - be approximately a white noise process when the models fit the data, which can be used as a practical tool for designing of models and for optimal experimental design.

Properties

The innovation estimator (5) has a number of important attributes:

  • The 100 ( 1 α ) % {\displaystyle 100(1-\alpha )\%} confidence limits θ ^ M ± {\displaystyle {\widehat {\theta }}_{M}\pm \bigtriangleup } for the innovation estimator θ ^ M {\displaystyle {\widehat {\theta }}_{M}} is estimated with

  △ = t 1 α , M ρ 1 d i a g ( V a r ( θ ^ M ) ) M p , {\displaystyle \ \qquad \qquad \bigtriangleup =t_{1-\alpha ,M-\rho -1}{\sqrt {\frac {diag(Var({\widehat {\theta }}_{M}))}{M-p}}},}

where t 1 α , M p 1 {\displaystyle t_{1-\alpha ,M-p-1}} is the t-student distribution with 100 ( 1 α ) % {\displaystyle 100(1-\alpha )\%} significance level, and M p 1 {\displaystyle M-p-1} degrees of freedom . Here, V a r ( θ ^ M ) = ( I ( θ ^ M ) ) 1 {\displaystyle {\text{V}}ar({\widehat {\theta }}_{M})=(I({\widehat {\theta }}_{M}))^{-1}} denotes the variance of the innovation estimator θ ^ M {\displaystyle {\widehat {\theta }}_{M}} , where

  I ( θ ^ M ) = k = 1 M 1 I k ( θ ^ M ) {\displaystyle \ \qquad \qquad I({\widehat {\theta }}_{M})=\sum _{k=1}^{M-1}I_{k}({\widehat {\theta }}_{M})}

is the Fisher Information matrix the innovation estimator θ ^ M {\displaystyle {\widehat {\theta }}_{M}} of θ 0 {\displaystyle \theta _{0}} and

[ I k ( θ ^ M ) ] m , n = μ θ m Σ 1 μ θ n + 1 2 t r a c e ( Σ 1 Σ θ m Σ 1 Σ θ n ) {\displaystyle \qquad \qquad \lbrack I_{k}({\widehat {\theta }}_{M})]_{m,n}={\frac {\partial \mu ^{\intercal }}{\partial \theta _{m}}}\Sigma ^{-1}{\frac {\partial \mu }{\partial \theta _{n}}}+{\frac {1}{2}}trace(\Sigma ^{-1}{\frac {\partial \Sigma }{\partial \theta _{m}}}\Sigma ^{-1}{\frac {\partial \Sigma }{\partial \theta _{n}}})}

is the entry ( m , n ) {\displaystyle (m,n)} of the matrix I k ( θ ^ M ) {\displaystyle I_{k}({\widehat {\theta }}_{M})} with μ = C x t k / t k 1 ( θ ^ M ) {\displaystyle \mu =\mathbf {Cx} _{t_{k}/t_{k-1}}({\widehat {\theta }}_{M})} and Σ = Σ t k ( θ ^ M ) {\displaystyle \Sigma =\mathbf {\Sigma } _{t_{k}}({\widehat {\theta }}_{M})} , for 1 m , n p {\displaystyle 1\leq m,n\leq p} .

  • The distribution of the fitting-innovation process { ν t k : ν t k = z t k C x t k / t k 1 ( θ ^ M ) } k = 1 , M 1 {\displaystyle \{\mathbf {\nu } _{t_{k}}:\mathbf {\nu } _{t_{k}}=\mathbf {z} _{t_{k}}-\mathbf {Cx} _{t_{k}/t_{k-1}}({\widehat {\theta }}_{M})\}_{k=1,\ldots M-1}} measures the goodness of fit of the model to the data.
  • For smooth enough function h {\displaystyle \mathbf {h} } , nonlinear observation equations of the form

z t k = h ( t k x ( t k ) ) + e t k , ( 6 ) {\displaystyle \qquad \qquad \mathbf {z} _{t_{k}}=\mathbf {h} (t_{k}{\text{, }}\mathbf {x} (t_{k}))+\mathbf {e} _{t_{k}},\qquad \qquad (6)}

can be transformed to the simpler one (2), and the innovation estimator (5) can be applied.

Approximate Innovation estimators

In practice, close form expressions for computing x t k / t k 1 ( θ ) {\displaystyle \mathbf {x} _{t_{k}/t_{k-1}}(\theta )} and U t k / t k 1 ( θ ) {\displaystyle \mathbf {U} _{t_{k}/t_{k-1}}(\theta )} in (5) are only available for a few models (1)-(2). Therefore, approximate filtering algorithms as the following are used in applications.

Given M {\displaystyle M} measurements Z t M 1 {\displaystyle Z_{t_{M-1}}} and the initial filter estimates y t 0 / t 0 = x t 0 / t 0 {\displaystyle \mathbf {y} _{t_{0}/t_{0}}=\mathbf {x} _{t_{0}/t_{0}}} , V t 0 / t 0 = U t 0 / t 0 {\displaystyle \mathbf {V} _{t_{0}/t_{0}}=\mathbf {U} _{t_{0}/t_{0}}} , the approximate Linear Minimum Variance (LMV) filter for the model (1)-(2) is iteratively defined at each observation time t k { t } M {\displaystyle t_{k}\in \{t\}_{M}} by the prediction estimates

y t k + 1 / t k = E ( y ( t k + 1 ) | Z t k ) {\displaystyle \qquad \qquad \mathbf {y} _{t_{k+1}/t_{k}}=E(\mathbf {y} (t_{k+1})|Z_{t_{k}})\quad } and V t k + 1 / t k = E ( y ( t k + 1 ) y ( t k + 1 ) | Z t k ) y t k + 1 / t k y t k + 1 / t k , ( 7 ) {\displaystyle \quad \mathbf {V} _{t_{k+1}/t_{k}}=E(\mathbf {y} (t_{k+1})\mathbf {y} ^{\intercal }(t_{k+1})|Z_{t_{k}})-\mathbf {y} _{t_{k+1}/t_{k}}\mathbf {y} _{t_{k+1}/t_{k}}^{\intercal },\qquad (7)}

with initial conditions y t k / t k {\displaystyle \mathbf {y} _{t_{k}/t_{k}}} and V t k / t k {\displaystyle \mathbf {V} _{t_{k}/t_{k}}} , and the filter estimates

y t k + 1 / t k + 1 = y t k + 1 / t k + K t k + 1 ( z t k + 1 C y t k + 1 / t k ) {\displaystyle \qquad \qquad \mathbf {y} _{t_{k+1}/t_{k+1}}=\mathbf {y} _{t_{k+1}/t_{k}}+\mathbf {K} _{t_{k+1}}\mathbf {(\mathbf {z} } _{t_{k+1}}-\mathbf {\mathbf {C} y} _{t_{k+1}/t_{k}}\mathbf {)} \quad } and V t k + 1 / t k + 1 = V t k + 1 / t k K t k + 1 C V t k + 1 / t k ( 8 ) {\displaystyle \quad \mathbf {V} _{t_{k+1}/t_{k+1}}=\mathbf {V} _{t_{k+1}/t_{k}}-\mathbf {K} _{t_{k+1}}\mathbf {CV} _{t_{k+1}/t_{k}}\qquad (8)}

with filter gain

K t k + 1 = V t k + 1 / t k C C V t k + 1 / t k ( C + Π t k + 1 ) 1 {\displaystyle \qquad \qquad \mathbf {K} _{t_{k+1}}=\mathbf {V} _{t_{k+1}/t_{k}}\mathbf {C} ^{\intercal }\mathbf {CV} _{t_{k+1}/t_{k}}(\mathbf {C} ^{\intercal }+\mathbf {\Pi } _{t_{k+1}})^{-1}}

for all t k , t k + 1 { t } M {\displaystyle t_{k},t_{k+1}\in \{t\}_{M}} , where y {\displaystyle \mathbf {y} } is an approximation to the solution x {\displaystyle \mathbf {x} } of (1) on the observation times { t } M {\displaystyle \{t\}_{M}} .

Given M {\displaystyle M} measurements Z t M 1 {\displaystyle Z_{t_{M-1}}} of the state space model (1)-(2) with θ = θ 0 {\displaystyle \mathbf {\theta =\theta } _{0}} on { t } M {\displaystyle \{t\}_{M}} , the approximate innovation estimator for the parameters θ 0 {\displaystyle \mathbf {\theta } _{0}} of (1) is defined by

ϑ ^ M = arg { min θ D θ   U ~ M ( θ , Z t M 1 ) } , ( 9 ) {\displaystyle \qquad \qquad {\widehat {\mathbf {\vartheta } }}_{M}=\arg\{{\underset {\mathbf {\theta \in } {\mathcal {D}}_{\theta }}{\mathbf {\min } }}{\text{ }}{\widetilde {U}}_{M}\mathbf {(\theta } ,Z_{t_{M-1}})\},\qquad \qquad (9)}

where

U ~ M ( θ , Z t M 1 ) = ( M 1 ) ln ( 2 π ) + k = 1 M 1 ln ( det ( Σ ~ t k ) ) + ν ~ t k ( Σ ~ t k ) 1 ν ~ t k , {\displaystyle \qquad \qquad {\widetilde {U}}_{M}(\mathbf {\theta } ,Z_{t_{M-1}})=(M-1)\ln(2\pi )+\sum \limits _{k=1}^{M-1}\ln(\det({\widetilde {\mathbf {\Sigma } }}_{t_{k}}))+{\widetilde {\mathbf {\nu } }}_{t_{k}}^{\intercal }({\widetilde {\mathbf {\Sigma } }}_{t_{k}})^{-1}{\widetilde {\mathbf {\nu } }}_{t_{k}},}

being

ν ~ t k = z t k C y t k / t k 1 ( θ ) {\displaystyle \qquad \qquad {\widetilde {\mathbf {\nu } }}_{t_{k}}=\mathbf {z} _{t_{k}}-\mathbf {Cy} _{t_{k}/t_{k-1}}(\theta )\qquad } and Σ ~ t k = C V t k / t k 1 ( θ ) C + Π t k {\displaystyle \qquad {\widetilde {\mathbf {\Sigma } }}_{t_{k}}=\mathbf {CV} _{t_{k}/t_{k-1}}(\theta )\mathbf {C} ^{\intercal }+\mathbf {\Pi } _{t_{k}}}

approximations to the discrete-time innovation (3) and innovation variance (4), respectively, resulting from the filtering algorithm (7)-(8).

For models with complete observations free of noise (i.e., with C = I {\displaystyle \mathbf {C} =\mathbf {I} } and Π t k = 0 {\displaystyle \mathbf {\Pi } _{t_{k}}=0} in (2)), the approximate innovation estimator (9) reduces to the known Quasi-Maximum Likelihood estimators for SDEs .

Main conventional-type estimators

Conventional-type innovation estimators are those (9) derived from conventional-type continuous-discrete or discrete-discrete approximate filtering algorithms. With approximate continuous-discrete filters there are the innovation estimators based on Local Linearization (LL) filters, on the extended Kalman filter, and on the second order filters. Approximate innovation estimators based on discrete-discrete filters result from the discretization of the SDE (1) by means of a numerical scheme. Typically, the effectiveness of these innovation estimators is directly related to the stability of the involved filtering algorithms.

A shared drawback of these conventional-type filters is that, once the observations are given, the error between the approximate and the exact innovation process is fixed and completely settled by the time distance between observations. This might set a large bias of the approximate innovation estimators in some applications, bias that cannot be corrected by increasing the number of observations. However, the conventional-type innovation estimators are useful in many practical situations for which only medium or low accuracy for the parameter estimation is required.

Order-β innovation estimators

Let us consider the finer time discretization ( τ ) h > 0 = { τ n : τ n + 1 τ n h  for  n = 0 , 1 , , N } {\displaystyle \left(\tau \right)_{h>0}=\{\tau _{n}:\tau _{n+1}-\tau _{n}\leq h{\text{ for }}n=0,1,\ldots ,N\}} of the time interval [ t 0 , t M 1 ] {\displaystyle } satisfying the condition ( τ ) h { t } M {\displaystyle \left(\tau \right)_{h}\supset \{t\}_{M}} . Further, let y n {\displaystyle \mathbf {y} _{n}} be the approximate value of x ( τ n ) {\displaystyle \mathbf {x} (\tau _{n})} obtained from a discretization of the equation (1) for all ( τ ) h {\displaystyle \left(\tau \right)_{h}} , and

y = { y ( t ) , t [ t 0 , t M 1 ] : y ( τ n ) = y n , {\displaystyle \qquad \qquad \mathbf {y} =\{\mathbf {y} (t),t\in \lbrack t_{0},t_{M-1}]:\mathbf {y} (\tau _{n})=\mathbf {y} _{n},\quad } for all τ n ( τ ) h } ( 10 ) {\displaystyle \quad \tau _{n}\in \left(\tau \right)_{h}\}\qquad \qquad (10)}

a continuous-time approximation to x {\displaystyle \mathbf {x} } .

A order- β {\displaystyle \beta } LMV filter. is an approximate LMV filter for which y {\displaystyle \mathbf {y} } is an order- β {\displaystyle \beta } weak approximation to x {\displaystyle \mathbf {x} } satisfying (10) and the weak convergence condition

sup t k t t k + 1 | E ( g ( x ( t ) ) | Z t k ) E ( g ( y ( t ) ) | Z t k ) | L k h β {\displaystyle \qquad \qquad {\underset {t_{k}\leq t\leq t_{k+1}}{\sup }}\left\vert E\left(g(\mathbf {x} (t))|Z_{t_{k}}\right)-E\left(g(\mathbf {y} (t))|Z_{t_{k}}\right)\right\vert \leq L_{k}h^{\beta }}

for all t k , t k + 1 { t } M {\displaystyle t_{k},t_{k+1}\in \{t\}_{M}} and any 2 ( β + 1 ) {\displaystyle 2(\beta +1)} times continuously differentiable functions g : R d R {\displaystyle g:\mathbb {R} ^{d}\rightarrow \mathbb {R} } for which g {\displaystyle g} and all its partial derivatives up to order 2 ( β + 1 ) {\displaystyle 2(\beta +1)} have polynomial growth, being L k {\displaystyle L_{k}} a positive constant. This order- β {\displaystyle \beta } LMV filter converges with rate β {\displaystyle \beta } to the exact LMV filter as h {\displaystyle h} goes to zero, where h {\displaystyle h} is the maximum stepsize of the time discretization ( τ ) h { t } M {\displaystyle (\tau )_{h}\supset \{t\}_{M}} on which the approximation y {\displaystyle \mathbf {y} } to x {\displaystyle \mathbf {x} } is defined.

A order- β {\displaystyle \beta } innovation estimator is an approximate innovation estimator (9) for which the approximations to the discrete-time innovation (3) and innovation variance (4), respectively, resulting from an order- β {\displaystyle \beta } LMV filter.

Approximations y {\displaystyle \mathbf {y} } of any kind converging to x {\displaystyle \mathbf {x} } in a weak sense (as, e.g., those in ) can be used to design an order- β {\displaystyle \beta } LMV filter and, consequently, an order- β {\displaystyle \beta } innovation estimator. These order- β {\displaystyle \beta } innovation estimators are intended for the recurrent practical situation in which a diffusion process should be identified from a reduced number of observations distant in time or when high accuracy for the estimated parameters is required.

Properties

An order- β {\displaystyle \beta } innovation estimator θ ^ M ( h ) {\displaystyle {\widehat {\mathbf {\theta } }}_{M}(h)} has a number of important properties:

  • For each given data Z t M 1 {\displaystyle Z_{t_{M-1}}} of M {\displaystyle M} observations, θ ^ M ( h ) {\displaystyle {\widehat {\mathbf {\theta } }}_{M}(h)} converges to the exact innovation estimator θ ^ M {\displaystyle {\widehat {\mathbf {\theta } }}_{M}} as the maximum stepsize h {\displaystyle h} of the time discretization ( τ ) h { t } M {\displaystyle \left(\tau \right)_{h}\supset \{t\}_{M}} goes to zero.
  • For finite samples of M {\displaystyle M} observations, the expected value of θ ^ M ( h ) {\displaystyle {\widehat {\mathbf {\theta } }}_{M}(h)} converges to the expected value of the exact innovation estimator θ ^ M {\displaystyle {\widehat {\mathbf {\theta } }}_{M}} as h {\displaystyle h} goes to zero.
  • For an increasing number of observations, θ ^ M ( h ) {\displaystyle {\widehat {\mathbf {\theta } }}_{M}(h)} is asymptotically normal distributed and its bias decreases when h {\displaystyle h} goes to zero.
  • Likewise to the convergence of the order- β {\displaystyle \beta } LMV filter to the exact LMV filter, for the convergence and asymptotic properties of θ ^ M ( h ) {\displaystyle {\widehat {\mathbf {\theta } }}_{M}(h)} there are no constraints on the time distance t k + 1 t k {\displaystyle t_{k+1}-t_{k}} between two consecutive observations z t k {\displaystyle \mathbf {z} _{t_{k}}} and z t k + 1 {\displaystyle \mathbf {z} _{t_{k+1}}} , nor on the time discretization ( τ ) h { t } M . {\displaystyle (\tau )_{h}\supset \{t\}_{M}.}
  • Approximations for the Akaike or Bayesian information criterion and confidence limits are directly obtained by replacing the exact estimator θ ^ M {\displaystyle {\widehat {\mathbf {\theta } }}_{M}} by its approximation θ ^ M ( h ) {\displaystyle {\widehat {\mathbf {\theta } }}_{M}(h)} . These approximations converge to the corresponding exact one when the maximum stepsize h {\displaystyle h} of the time discretization ( τ ) h { t } M {\displaystyle \left(\tau \right)_{h}\supset \{t\}_{M}} goes to zero.
  • The distribution of the approximate fitting-innovation process { ν ~ t k : ν ~ t k = z t k C y t k / t k 1 ( θ ^ M ( h ) ) } k = 1 , M 1 {\displaystyle \{{\widetilde {\mathbf {\nu } }}_{t_{k}}:{\widetilde {\mathbf {\nu } }}_{t_{k}}=\mathbf {z} _{t_{k}}-\mathbf {Cy} _{t_{k}/t_{k-1}}({\widehat {\theta }}_{M}(h))\}_{k=1,\ldots M-1}} measures the goodness of fit of the model to the data, which is also used as a practical tool for designing of models and for optimal experimental design.
  • For smooth enough function h {\displaystyle \mathbf {h} } , nonlinear observation equations of the form (6) can be transformed to the simpler one (2), and the order- β {\displaystyle \beta } innovation estimator can be applied.
Fig. 1 Histograms of the differences ( α ^ M α ^ h , M D , σ ^ M σ ^ h , M D ) {\displaystyle ({\widehat {\alpha }}_{M}-{\widehat {\alpha }}_{h,M}^{D},{\widehat {\sigma }}_{M}-{\widehat {\sigma }}_{h,M}^{D})} and ( α ^ M α ^ h , M , σ ^ M σ ^ h , M ) {\displaystyle ({\widehat {\alpha }}_{M}-{\widehat {\alpha }}_{h,M},{\widehat {\sigma }}_{M}-{\widehat {\sigma }}_{h,M})} between the exact innovation estimator ( α ^ M , σ ^ M ) {\displaystyle ({\widehat {\alpha }}_{M},{\widehat {\sigma }}_{M})} with the conventional ( α ^ h , M D , σ ^ h , M D ) {\displaystyle ({\widehat {\alpha }}_{h,M}^{D},{\widehat {\sigma }}_{h,M}^{D})} and order- 1 {\displaystyle 1} ( α ^ h , M , σ ^ h , M ) {\displaystyle ({\widehat {\alpha }}_{h,M},{\widehat {\sigma }}_{h,M})} innovation estimators for the parameters ( α , σ ) {\displaystyle (\alpha ,\sigma )} of the model (11)-(12) given 100 {\displaystyle 100} time series of M = 10 {\displaystyle M=10} noisy observations on the time interval [ 0.5 , 0.5 + M 1 ] {\displaystyle } with sampling period Δ = 1 {\displaystyle \Delta =1} .

Figure 1 presents the histograms of the differences ( α ^ M α ^ h , M D , σ ^ M σ ^ h , M D ) {\displaystyle ({\widehat {\alpha }}_{M}-{\widehat {\alpha }}_{h,M}^{D},{\widehat {\sigma }}_{M}-{\widehat {\sigma }}_{h,M}^{D})} and ( α ^ M α ^ h , M , σ ^ M σ ^ h , M ) {\displaystyle ({\widehat {\alpha }}_{M}-{\widehat {\alpha }}_{h,M},{\widehat {\sigma }}_{M}-{\widehat {\sigma }}_{h,M})} between the exact innovation estimator ( α ^ M , σ ^ M ) {\displaystyle ({\widehat {\alpha }}_{M},{\widehat {\sigma }}_{M})} with the conventional ( α ^ h , M D , σ ^ h , M D ) {\displaystyle ({\widehat {\alpha }}_{h,M}^{D},{\widehat {\sigma }}_{h,M}^{D})} and order- 1 {\displaystyle 1} ( α ^ h , M , σ ^ h , M ) {\displaystyle ({\widehat {\alpha }}_{h,M},{\widehat {\sigma }}_{h,M})} innovation estimators for the parameters α = 0.1 {\displaystyle \alpha =-0.1} and σ = 0.1 {\displaystyle \sigma =0.1} of the equation

d x = t x d t + σ t x d w ( 11 ) {\displaystyle \qquad dx=txdt+\sigma {\sqrt {t}}xdw\quad (11)}

obtained from 100 time series z t 0 , . . , z t M 1 {\displaystyle z_{t_{0}},..,z_{t_{M-1}}} of M {\displaystyle M} noisy observations

z t k = x ( t k ) + e t k ,  for  k = 0 , 1 , . . , M 1 , ( 12 ) {\displaystyle \qquad z_{t_{k}}=x(t_{k})+e_{t_{k}},{\text{ for }}k=0,1,..,M-1,\quad (12)}

of x {\displaystyle x} on the observation times { t } M = 10 = { t k = 0.5 + k Δ : k = 0 , M 1 {\displaystyle \{t\}_{M=10}=\{t_{k}=0.5+k\Delta :k=0,\ldots M-1} , Δ = 1 } {\displaystyle \Delta =1\}} , with x ( 0.5 ) = 1 {\displaystyle x(0.5)=1} and Π k = 0.0001 {\displaystyle \Pi _{k}=0.0001} . The classical and the order- 1 {\displaystyle 1} Local Linearization filters of the innovation estimators ( α ^ h , M D , σ ^ h , M D ) {\displaystyle ({\widehat {\alpha }}_{h,M}^{D},{\widehat {\sigma }}_{h,M}^{D})} and ( α ^ h , M , σ ^ h , M ) {\displaystyle ({\widehat {\alpha }}_{h,M},{\widehat {\sigma }}_{h,M})} are defined as in , respectively, on the uniform time discretizations ( τ ) h = Δ { t } M {\displaystyle \left(\tau \right)_{h=\Delta }\equiv \{t\}_{M}} and ( τ ) h = Δ / 2 , Δ / 8 , Δ / 32 = { τ n : τ n = 0.5 + n h {\displaystyle \left(\tau \right)_{h=\Delta /2,\Delta /8,\Delta /32}=\{\tau _{n}:\tau _{n}=0.5+nh} , with n = 0 , 1 , , ( M 1 ) / h } {\displaystyle n=0,1,\ldots ,(M-1)/h\}} . The number of stochastic simulations of the order- 1 {\displaystyle 1} Local Linearization filter is estimated via an adaptive sampling algorithm with moderate tolerance. The Figure 1 illustrates the convergence of the order- 1 {\displaystyle 1} innovation estimator ( α ^ h , M , σ ^ h , M ) {\displaystyle ({\widehat {\alpha }}_{h,M},{\widehat {\sigma }}_{h,M})} to the exact innovation estimators ( α ^ M , σ ^ M ) {\displaystyle ({\widehat {\alpha }}_{M},{\widehat {\sigma }}_{M})} as h {\displaystyle h} decreases, which substantially improves the estimation provided by the conventional innovation estimator ( α ^ Δ , M D , σ ^ Δ , M D ) {\displaystyle ({\widehat {\alpha }}_{\Delta ,M}^{D},{\widehat {\sigma }}_{\Delta ,M}^{D})} .

Deterministic approximations

The order- β {\displaystyle \beta } innovation estimators overcome the drawback of the conventional-type innovation estimators concerning the impossibility of reducing bias. However, the viable bias reduction of an order- β {\displaystyle \beta } innovation estimators might eventually require that the associated order- β {\displaystyle \beta } LMV filter performs a large number of stochastic simulations. In situations where only low or medium precision approximate estimators are needed, an alternative deterministic filter algorithm - called deterministic order- β {\displaystyle \beta } LMV filter - can be obtained by tracking the first two conditional moments μ {\displaystyle \mu } and Λ {\displaystyle \Lambda } of the order- β {\displaystyle \beta } weak approximation y {\displaystyle \mathbf {y} } at all the time instants τ n ( τ ) h {\displaystyle \tau _{n}\in \left(\tau \right)_{h}} in between two consecutive observation times t k {\displaystyle t_{k}} and t k + 1 {\displaystyle t_{k+1}} . That is, the value of the predictions y t k + 1 / t k {\displaystyle \mathbf {y} _{t_{k+1}/t_{k}}} and P t k + 1 / t k {\displaystyle \mathbf {P} _{t_{k+1}/t_{k}}} in the filtering algorithm are computed from the recursive formulas

y τ n + 1 / t k = μ ( τ n , y τ n / t k ; h n ) {\displaystyle \qquad \qquad \mathbf {y} _{\tau _{n+1}/t_{k}}=\mu (\tau _{n},\mathbf {y} _{\tau _{n}/t_{k}};h_{n})\quad } and P τ n + 1 / t k = Λ ( τ n , P τ n / t k ; h n ) , {\displaystyle \quad \mathbf {P} _{\tau _{n+1}/t_{k}}=\Lambda (\tau _{n},\mathbf {P} _{\tau _{n}/t_{k}};h_{n}),\quad } with τ n , τ n + 1 ( τ ) h [ t k , t k + 1 ] , {\displaystyle \tau _{n},\tau _{n+1}\in (\tau )_{h}\cap \lbrack t_{k,}t_{k+1}],}

and with h n = τ n + 1 τ n {\displaystyle h_{n}=\tau _{n+1}-\tau _{n}} . The approximate innovation estimators θ ^ h , M {\displaystyle {\widehat {\mathbf {\theta } }}_{h,M}} defined with these deterministic order- β {\displaystyle \beta } LMV filters not longer converge to the exact innovation estimator, but allow a significant bias reduction in the estimated parameters for a given finite sample with a lower computational cost.

Fig. 2 Histograms and confidence limits for the innovation estimators ( α ^ h , M , σ ^ h , M ) {\displaystyle ({\widehat {\alpha }}_{h,M},{\widehat {\sigma }}_{h,M})} and ( α ^ , M , σ ^ , M ) {\displaystyle ({\widehat {\alpha }}_{\cdot ,M},{\widehat {\sigma }}_{\cdot ,M})} of ( α , σ ) {\displaystyle (\alpha ,\sigma )} computed with the deterministic order-1 LL filter on uniform ( τ ) h , M {\displaystyle \left(\tau \right)_{h,M}} and adaptive ( τ ) , M {\displaystyle \left(\tau \right)_{\cdot ,M}} time discretizations, respectively, from 100 {\displaystyle 100} noisy realizations of the Van der Pol model (13)-(15) with sampling period Δ = 1 {\displaystyle \Delta =1} on the time interval [ 0 , M 1 ] {\displaystyle } and M = 30 {\displaystyle M=30} . Observe the bias reduction of the estimated parameter as h {\displaystyle h} decreases.

Figure 2 presents the histograms and the confidence limits of the approximate innovation estimators ( α ^ h , M , σ ^ h , M ) {\displaystyle ({\widehat {\alpha }}_{h,M},{\widehat {\sigma }}_{h,M})} and ( α ^ , M , σ ^ , M ) {\displaystyle ({\widehat {\alpha }}_{\cdot ,M},{\widehat {\sigma }}_{\cdot ,M})} for the parameters α = 1 {\displaystyle \alpha =1} and σ = 1 {\displaystyle \sigma =1} of the Van der Pol oscillator with random frequency

d x 1 = x 2 d t ( 13 ) {\displaystyle \qquad dx_{1}=x_{2}dt\quad (13)}

d x 2 = ( ( x 1 2 1 ) x 2 α x 1 ) d t + σ x 1 d w ( 14 ) {\displaystyle \qquad dx_{2}=(-(x_{1}^{2}-1)x_{2}-\alpha x_{1})dt+\sigma x_{1}dw\quad (14)}

obtained from 100 time series z t 0 , . . , z t M 1 {\displaystyle z_{t_{0}},..,z_{t_{M-1}}} of M {\displaystyle M} partial and noisy observations

z t k = x 1 ( t k ) + e t k ,  for  k = 0 , 1 , . . , M 1 , ( 15 ) {\displaystyle \qquad z_{t_{k}}=x_{1}(t_{k})+e_{t_{k}},{\text{ for }}k=0,1,..,M-1,\quad (15)}

of x {\displaystyle x} on the observation times { t } M = 30 = { t k = k Δ : k = 0 , M 1 {\displaystyle \{t\}_{M=30}=\{t_{k}=k\Delta :k=0,\ldots M-1} , Δ = 1 } {\displaystyle \Delta =1\}} , with ( x 1 ( 0 ) , x 1 ( 0 ) ) = ( 1 , 1 ) {\displaystyle (x_{1}(0),x_{1}(0))=(1,1)} and Π k = 0.001 {\displaystyle \Pi _{k}=0.001} . The deterministic order- 1 {\displaystyle 1} Local Linearization filter of the innovation estimators ( α ^ h , , M , σ ^ h , M ) {\displaystyle ({\widehat {\alpha }}_{h,,M},{\widehat {\sigma }}_{h,M})} and ( α ^ , M , σ ^ , M ) {\displaystyle ({\widehat {\alpha }}_{\cdot ,M},{\widehat {\sigma }}_{\cdot ,M})} is defined , for each estimator, on uniform time discretizations ( τ ) h = { τ n : τ n = n h {\displaystyle \left(\tau \right)_{h}=\{\tau _{n}:\tau _{n}=nh} , with n = 0 , 1 , , ( M 1 ) / h } {\displaystyle n=0,1,\ldots ,(M-1)/h\}} and on an adaptive time-stepping discretization ( τ ) {\displaystyle \left(\tau \right)_{\cdot }} with moderate relative and absolute tolerances, respectively. Observe the bias reduction of the estimated parameter as h {\displaystyle h} decreases.

Software

A Matlab implementation of various approximate innovation estimators is provided by the SdeEstimation toolbox. This toolbox has Local Linearization filters, including deterministic and stochastic options with fixed step sizes and sample numbers. It also offers adaptive time stepping and sampling algorithms, along with local and global optimization algorithms for innovation estimation. For models with complete observations free of noise, various approximations to the Quasi-Maximum Likelihood estimator are implemented in R.

References

  1. ^ Ozaki, Tohru (1994), Bozdogan, H.; Sclove, S. L.; Gupta, A. K.; Haughton, D. (eds.), "The Local Linearization Filter with Application to Nonlinear System Identifications", Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach: Volume 3 Engineering and Scientific Applications, Dordrecht: Springer Netherlands, pp. 217–240, doi:10.1007/978-94-011-0854-6_10, ISBN 978-94-011-0854-6, retrieved 2023-07-06
  2. ^ Jazwinski A.H., Stochastic Processes and Filtering Theory, Academic Press, New York, 1970.
  3. ^ Nielsen, Jan Nygaard; Vestergaard, Martin (2000). "Estimation in continuous-time stochastic volatility models using nonlinear filters". International Journal of Theoretical and Applied Finance. 03 (02): 279–308. doi:10.1142/S0219024900000139. ISSN 0219-0249.
  4. Kailath T., Lectures on Wiener and Kalman Filtering. New York: Springer-Verlag, 1981.
  5. ^ Jimenez, J. C.; Ozaki, T. (2006). "An Approximate Innovation Method For The Estimation Of Diffusion Processes From Discrete Data". Journal of Time Series Analysis. 27 (1): 77–97. doi:10.1111/j.1467-9892.2005.00454.x. ISSN 0143-9782.
  6. ^ Jimenez, J. C.; Yoshimoto, A.; Miwakeichi, F. (2021-08-24). "State and parameter estimation of stochastic physical systems from uncertain and indirect measurements". The European Physical Journal Plus: 136, 869. doi:10.1140/epjp/s13360-021-01859-1. ISSN 2190-5444.
  7. Schweppe, F. (1965). "Evaluation of likelihood functions for Gaussian signals". IEEE Transactions on Information Theory. 11 (1): 61–70. doi:10.1109/TIT.1965.1053737. ISSN 1557-9654.
  8. Ljung L., System Identification, Theory for the User (2nd edn). Englewood Cliffs: Prentice Hall, 1999.
  9. Lennart, Ljung; Caines, Peter E. (1980). "Asymptotic normality of prediction error estimators for approximate system models". Stochastics. 3 (1–4): 29–46. doi:10.1080/17442507908833135. ISSN 0090-9491.
  10. ^ Nolsoe K., Nielsen, J.N., Madsen H. (2000) "Prediction-based estimating function for diffusion processes with measurement noise", Technical Reports 2000, No. 10, Informatics and Mathematical Modelling, Technical University of Denmark.
  11. ^ Ozaki, T.; Jimenez, J. C.; Haggan-Ozaki, V. (2000). "The Role of the Likelihood Function in the Estimation of Chaos Models". Journal of Time Series Analysis. 21 (4): 363–387. doi:10.1111/1467-9892.00189. ISSN 0143-9782.
  12. ^ Jimenez, J.C. (2020). "Bias reduction in the estimation of diffusion processes from discrete observations". academic.oup.com. IMA J. Math. Control. Inform. pp. 1468–1505. doi:10.1093/imamci/dnaa021. Retrieved 2023-07-06.
  13. ^ Jimenez, J.C. (2019). "Approximate linear minimum variance filters for continuous-discrete state space models: convergence and practical adaptive algorithms". academic.oup.com. pp. 341–378. doi:10.1093/imamci/dnx047. Retrieved 2023-07-06.
  14. Shoji, Isao (1998). "A comparative study of maximum likelihood estimators for nonlinear dynamical system models". International Journal of Control. 71 (3): 391–404. doi:10.1080/002071798221731. ISSN 0020-7179.
  15. Nielsen, Jan Nygaard; Madsen, Henrik (2001-01-01). "Applying the EKF to stochastic differential equations with level effects". Automatica. 37 (1): 107–112. doi:10.1016/S0005-1098(00)00128-X. ISSN 0005-1098.
  16. ^ Singer, Hermann (2002). "Parameter Estimation of Nonlinear Stochastic Differential Equations: Simulated Maximum Likelihood versus Extended Kalman Filter and Itô-Taylor Expansion". Journal of Computational and Graphical Statistics. 11 (4): 972–995. doi:10.1198/106186002808. ISSN 1061-8600.
  17. Ozaki, Tohru; Iino, Mitsunori (2001). "An innovation approach to non-Gaussian time series analysis". Journal of Applied Probability. 38 (A): 78–92. doi:10.1239/jap/1085496593. ISSN 0021-9002.
  18. Peng, H.; Ozaki, T.; Jimenez, J.C. (2002). "Modeling and control for foreign exchange based on a continuous time stochastic microstructure model". Proceedings of the 41st IEEE Conference on Decision and Control, 2002. 4: 4440–4445 vol.4. doi:10.1109/CDC.2002.1185071.
  19. Kloeden P.E., Platen E., Numerical Solution of Stochastic Differential Equations, 3rd edn. Berlin: Springer, 1999.
  20. "GitHub - locallinearization/SdeEstimation". GitHub. Retrieved 2023-07-06.
  21. Iacus S.M., Simulation and inference for stochastic differential equations: with R examples, New York: Springer, 2008.
Category: