Misplaced Pages

Particle filter

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
(Redirected from Sequential Monte Carlo method) Type of Monte Carlo algorithms for signal processing and statistical inference This article is about mathematical algorithms. For devices to filter particles from air, see Air filter.

Particle filters, or sequential Monte Carlo methods, are a set of Monte Carlo algorithms used to find approximate solutions for filtering problems for nonlinear state-space systems, such as signal processing and Bayesian statistical inference. The filtering problem consists of estimating the internal states in dynamical systems when partial observations are made and random perturbations are present in the sensors as well as in the dynamical system. The objective is to compute the posterior distributions of the states of a Markov process, given the noisy and partial observations. The term "particle filters" was first coined in 1996 by Pierre Del Moral about mean-field interacting particle methods used in fluid mechanics since the beginning of the 1960s. The term "Sequential Monte Carlo" was coined by Jun S. Liu and Rong Chen in 1998.

Particle filtering uses a set of particles (also called samples) to represent the posterior distribution of a stochastic process given the noisy and/or partial observations. The state-space model can be nonlinear and the initial state and noise distributions can take any form required. Particle filter techniques provide a well-established methodology for generating samples from the required distribution without requiring assumptions about the state-space model or the state distributions. However, these methods do not perform well when applied to very high-dimensional systems.

Particle filters update their prediction in an approximate (statistical) manner. The samples from the distribution are represented by a set of particles; each particle has a likelihood weight assigned to it that represents the probability of that particle being sampled from the probability density function. Weight disparity leading to weight collapse is a common issue encountered in these filtering algorithms. However, it can be mitigated by including a resampling step before the weights become uneven. Several adaptive resampling criteria can be used including the variance of the weights and the relative entropy concerning the uniform distribution. In the resampling step, the particles with negligible weights are replaced by new particles in the proximity of the particles with higher weights.

From the statistical and probabilistic point of view, particle filters may be interpreted as mean-field particle interpretations of Feynman-Kac probability measures. These particle integration techniques were developed in molecular chemistry and computational physics by Theodore E. Harris and Herman Kahn in 1951, Marshall N. Rosenbluth and Arianna W. Rosenbluth in 1955, and more recently by Jack H. Hetherington in 1984. In computational physics, these Feynman-Kac type path particle integration methods are also used in Quantum Monte Carlo, and more specifically Diffusion Monte Carlo methods. Feynman-Kac interacting particle methods are also strongly related to mutation-selection genetic algorithms currently used in evolutionary computation to solve complex optimization problems.

The particle filter methodology is used to solve Hidden Markov Model (HMM) and nonlinear filtering problems. With the notable exception of linear-Gaussian signal-observation models (Kalman filter) or wider classes of models (Benes filter), Mireille Chaleyat-Maurel and Dominique Michel proved in 1984 that the sequence of posterior distributions of the random states of a signal, given the observations (a.k.a. optimal filter), has no finite recursion. Various other numerical methods based on fixed grid approximations, Markov Chain Monte Carlo techniques, conventional linearization, extended Kalman filters, or determining the best linear system (in the expected cost-error sense) are unable to cope with large-scale systems, unstable processes, or insufficiently smooth nonlinearities.

Particle filters and Feynman-Kac particle methodologies find application in signal and image processing, Bayesian inference, machine learning, risk analysis and rare event sampling, engineering and robotics, artificial intelligence, bioinformatics, phylogenetics, computational science, economics and mathematical finance, molecular chemistry, computational physics, pharmacokinetics, quantitative risk and insurance and other fields.

History

Heuristic-like algorithms

From a statistical and probabilistic viewpoint, particle filters belong to the class of branching/genetic type algorithms, and mean-field type interacting particle methodologies. The interpretation of these particle methods depends on the scientific discipline. In Evolutionary Computing, mean-field genetic type particle methodologies are often used as heuristic and natural search algorithms (a.k.a. Metaheuristic). In computational physics and molecular chemistry, they are used to solve Feynman-Kac path integration problems or to compute Boltzmann-Gibbs measures, top eigenvalues, and ground states of Schrödinger operators. In Biology and Genetics, they represent the evolution of a population of individuals or genes in some environment.

The origins of mean-field type evolutionary computational techniques can be traced back to 1950 and 1954 with Alan Turing's work on genetic type mutation-selection learning machines and the articles by Nils Aall Barricelli at the Institute for Advanced Study in Princeton, New Jersey. The first trace of particle filters in statistical methodology dates back to the mid-1950s; the 'Poor Man's Monte Carlo', that was proposed by Hammersley et al., in 1954, contained hints of the genetic type particle filtering methods used today. In 1963, Nils Aall Barricelli simulated a genetic type algorithm to mimic the ability of individuals to play a simple game. In evolutionary computing literature, genetic-type mutation-selection algorithms became popular through the seminal work of John Holland in the early 1970s, particularly his book published in 1975.

In Biology and Genetics, the Australian geneticist Alex Fraser also published in 1957 a series of papers on the genetic type simulation of artificial selection of organisms. The computer simulation of the evolution by biologists became more common in the early 1960s, and the methods were described in books by Fraser and Burnell (1970) and Crosby (1973). Fraser's simulations included all of the essential elements of modern mutation-selection genetic particle algorithms.

From the mathematical viewpoint, the conditional distribution of the random states of a signal given some partial and noisy observations is described by a Feynman-Kac probability on the random trajectories of the signal weighted by a sequence of likelihood potential functions. Quantum Monte Carlo, and more specifically Diffusion Monte Carlo methods can also be interpreted as a mean-field genetic type particle approximation of Feynman-Kac path integrals. The origins of Quantum Monte Carlo methods are often attributed to Enrico Fermi and Robert Richtmyer who developed in 1948 a mean-field particle interpretation of neutron-chain reactions, but the first heuristic-like and genetic type particle algorithm (a.k.a. Resampled or Reconfiguration Monte Carlo methods) for estimating ground state energies of quantum systems (in reduced matrix models) is due to Jack H. Hetherington in 1984. One can also quote the earlier seminal works of Theodore E. Harris and Herman Kahn in particle physics, published in 1951, using mean-field but heuristic-like genetic methods for estimating particle transmission energies. In molecular chemistry, the use of genetic heuristic-like particle methodologies (a.k.a. pruning and enrichment strategies) can be traced back to 1955 with the seminal work of Marshall N. Rosenbluth and Arianna W. Rosenbluth.

The use of genetic particle algorithms in advanced signal processing and Bayesian inference is more recent. In January 1993, Genshiro Kitagawa developed a "Monte Carlo filter", a slightly modified version of this article appeared in 1996. In April 1993, Gordon et al., published in their seminal work an application of genetic type algorithm in Bayesian statistical inference. The authors named their algorithm 'the bootstrap filter', and demonstrated that compared to other filtering methods, their bootstrap algorithm does not require any assumption about that state space or the noise of the system. Independently, the ones by Pierre Del Moral and Himilcon Carvalho, Pierre Del Moral, André Monin, and Gérard Salut on particle filters published in the mid-1990s. Particle filters were also developed in signal processing in early 1989-1992 by P. Del Moral, J.C. Noyer, G. Rigal, and G. Salut in the LAAS-CNRS in a series of restricted and classified research reports with STCAN (Service Technique des Constructions et Armes Navales), the IT company DIGILOG, and the LAAS-CNRS (the Laboratory for Analysis and Architecture of Systems) on RADAR/SONAR and GPS signal processing problems.

Mathematical foundations

From 1950 to 1996, all the publications on particle filters, and genetic algorithms, including the pruning and resample Monte Carlo methods introduced in computational physics and molecular chemistry, present natural and heuristic-like algorithms applied to different situations without a single proof of their consistency, nor a discussion on the bias of the estimates and genealogical and ancestral tree-based algorithms.

The mathematical foundations and the first rigorous analysis of these particle algorithms are due to Pierre Del Moral in 1996. The article also contains proof of the unbiased properties of a particle approximation of likelihood functions and unnormalized conditional probability measures. The unbiased particle estimator of the likelihood functions presented in this article is used today in Bayesian statistical inference.

Dan Crisan, Jessica Gaines, and Terry Lyons, as well as Pierre Del Moral, and Terry Lyons, created branching-type particle techniques with various population sizes around the end of the 1990s. P. Del Moral, A. Guionnet, and L. Miclo made more advances in this subject in 2000. Pierre Del Moral and Alice Guionnet proved the first central limit theorems in 1999, and Pierre Del Moral and Laurent Miclo proved them in 2000. The first uniform convergence results concerning the time parameter for particle filters were developed at the end of the 1990s by Pierre Del Moral and Alice Guionnet. The first rigorous analysis of genealogical tree-ased particle filter smoothers is due to P. Del Moral and L. Miclo in 2001

The theory on Feynman-Kac particle methodologies and related particle filter algorithms was developed in 2000 and 2004 in the books. These abstract probabilistic models encapsulate genetic type algorithms, particle, and bootstrap filters, interacting Kalman filters (a.k.a. Rao–Blackwellized particle filter), importance sampling and resampling style particle filter techniques, including genealogical tree-based and particle backward methodologies for solving filtering and smoothing problems. Other classes of particle filtering methodologies include genealogical tree-based models, backward Markov particle models, adaptive mean-field particle models, island-type particle models, particle Markov chain Monte Carlo methodologies, Sequential Monte Carlo samplers and Sequential Monte Carlo Approximate Bayesian Computation methods and Sequential Monte Carlo ABC based Bayesian Bootstrap.

The filtering problem

Objective

A particle filter's goal is to estimate the posterior density of state variables given observation variables. The particle filter is intended for use with a hidden Markov Model, in which the system includes both hidden and observable variables. The observable variables (observation process) are linked to the hidden variables (state-process) via a known functional form. Similarly, the probabilistic description of the dynamical system defining the evolution of the state variables is known.

A generic particle filter estimates the posterior distribution of the hidden states using the observation measurement process. With respect to a state-space such as the one below:

X 0 X 1 X 2 X 3 signal Y 0 Y 1 Y 2 Y 3 observation {\displaystyle {\begin{array}{cccccccccc}X_{0}&\to &X_{1}&\to &X_{2}&\to &X_{3}&\to &\cdots &{\text{signal}}\\\downarrow &&\downarrow &&\downarrow &&\downarrow &&\cdots &\\Y_{0}&&Y_{1}&&Y_{2}&&Y_{3}&&\cdots &{\text{observation}}\end{array}}}

the filtering problem is to estimate sequentially the values of the hidden states X k {\displaystyle X_{k}} , given the values of the observation process Y 0 , , Y k , {\displaystyle Y_{0},\cdots ,Y_{k},} at any time step k.

All Bayesian estimates of X k {\displaystyle X_{k}} follow from the posterior density p ( x k | y 0 , y 1 , . . . , y k ) {\displaystyle p(x_{k}|y_{0},y_{1},...,y_{k})} . The particle filter methodology provides an approximation of these conditional probabilities using the empirical measure associated with a genetic type particle algorithm. In contrast, the Markov Chain Monte Carlo or importance sampling approach would model the full posterior p ( x 0 , x 1 , . . . , x k | y 0 , y 1 , . . . , y k ) {\displaystyle p(x_{0},x_{1},...,x_{k}|y_{0},y_{1},...,y_{k})} .

The Signal-Observation model

Particle methods often assume X k {\displaystyle X_{k}} and the observations Y k {\displaystyle Y_{k}} can be modeled in this form:

  • X 0 , X 1 , {\displaystyle X_{0},X_{1},\cdots } is a Markov process on R d x {\displaystyle \mathbb {R} ^{d_{x}}} (for some d x 1 {\displaystyle d_{x}\geqslant 1} ) that evolves according to the transition probability density p ( x k | x k 1 ) {\displaystyle p(x_{k}|x_{k-1})} . This model is also often written in a synthetic way as
    X k | X k 1 = x k p ( x k | x k 1 ) {\displaystyle X_{k}|X_{k-1}=x_{k}\sim p(x_{k}|x_{k-1})}
with an initial probability density p ( x 0 ) {\displaystyle p(x_{0})} .
  • The observations Y 0 , Y 1 , {\displaystyle Y_{0},Y_{1},\cdots } take values in some state space on R d y {\displaystyle \mathbb {R} ^{d_{y}}} (for some d y 1 {\displaystyle d_{y}\geqslant 1} ) and are conditionally independent provided that X 0 , X 1 , {\displaystyle X_{0},X_{1},\cdots } are known. In other words, each Y k {\displaystyle Y_{k}} only depends on X k {\displaystyle X_{k}} . In addition, we assume conditional distribution for Y k {\displaystyle Y_{k}} given X k = x k {\displaystyle X_{k}=x_{k}} are absolutely continuous, and in a synthetic way we have
    Y k | X k = y k p ( y k | x k ) {\displaystyle Y_{k}|X_{k}=y_{k}\sim p(y_{k}|x_{k})}

An example of system with these properties is:

X k = g ( X k 1 ) + W k 1 {\displaystyle X_{k}=g(X_{k-1})+W_{k-1}}
Y k = h ( X k ) + V k {\displaystyle Y_{k}=h(X_{k})+V_{k}}

where both W k {\displaystyle W_{k}} and V k {\displaystyle V_{k}} are mutually independent sequences with known probability density functions and g and h are known functions. These two equations can be viewed as state space equations and look similar to the state space equations for the Kalman filter. If the functions g and h in the above example are linear, and if both W k {\displaystyle W_{k}} and V k {\displaystyle V_{k}} are Gaussian, the Kalman filter finds the exact Bayesian filtering distribution. If not, Kalman filter-based methods are a first-order approximation (EKF) or a second-order approximation (UKF in general, but if the probability distribution is Gaussian a third-order approximation is possible).

The assumption that the initial distribution and the transitions of the Markov chain are continuous for the Lebesgue measure can be relaxed. To design a particle filter we simply need to assume that we can sample the transitions X k 1 X k {\displaystyle X_{k-1}\to X_{k}} of the Markov chain X k , {\displaystyle X_{k},} and to compute the likelihood function x k p ( y k | x k ) {\displaystyle x_{k}\mapsto p(y_{k}|x_{k})} (see for instance the genetic selection mutation description of the particle filter given below). The continuous assumption on the Markov transitions of X k {\displaystyle X_{k}} is only used to derive in an informal (and rather abusive) way different formulae between posterior distributions using the Bayes' rule for conditional densities.

Approximate Bayesian computation models

Main article: Approximate Bayesian computation

In certain problems, the conditional distribution of observations, given the random states of the signal, may fail to have a density; the latter may be impossible or too complex to compute. In this situation, an additional level of approximation is necessitated. One strategy is to replace the signal X k {\displaystyle X_{k}} by the Markov chain X k = ( X k , Y k ) {\displaystyle {\mathcal {X}}_{k}=\left(X_{k},Y_{k}\right)} and to introduce a virtual observation of the form

Y k = Y k + ϵ V k for some parameter ϵ [ 0 , 1 ] {\displaystyle {\mathcal {Y}}_{k}=Y_{k}+\epsilon {\mathcal {V}}_{k}\quad {\mbox{for some parameter}}\quad \epsilon \in }

for some sequence of independent random variables V k {\displaystyle {\mathcal {V}}_{k}} with known probability density functions. The central idea is to observe that

Law ( X k | Y 0 = y 0 , , Y k = y k ) ϵ 0 Law ( X k | Y 0 = y 0 , , Y k = y k ) {\displaystyle {\text{Law}}\left(X_{k}|{\mathcal {Y}}_{0}=y_{0},\cdots ,{\mathcal {Y}}_{k}=y_{k}\right)\approx _{\epsilon \downarrow 0}{\text{Law}}\left(X_{k}|Y_{0}=y_{0},\cdots ,Y_{k}=y_{k}\right)}

The particle filter associated with the Markov process X k = ( X k , Y k ) {\displaystyle {\mathcal {X}}_{k}=\left(X_{k},Y_{k}\right)} given the partial observations Y 0 = y 0 , , Y k = y k , {\displaystyle {\mathcal {Y}}_{0}=y_{0},\cdots ,{\mathcal {Y}}_{k}=y_{k},} is defined in terms of particles evolving in R d x + d y {\displaystyle \mathbb {R} ^{d_{x}+d_{y}}} with a likelihood function given with some obvious abusive notation by p ( Y k | X k ) {\displaystyle p({\mathcal {Y}}_{k}|{\mathcal {X}}_{k})} . These probabilistic techniques are closely related to Approximate Bayesian Computation (ABC). In the context of particle filters, these ABC particle filtering techniques were introduced in 1998 by P. Del Moral, J. Jacod and P. Protter. They were further developed by P. Del Moral, A. Doucet and A. Jasra.

The nonlinear filtering equation

Bayes' rule for conditional probability gives:

p ( x 0 , , x k | y 0 , , y k ) = p ( y 0 , , y k | x 0 , , x k ) p ( x 0 , , x k ) p ( y 0 , , y k ) {\displaystyle p(x_{0},\cdots ,x_{k}|y_{0},\cdots ,y_{k})={\frac {p(y_{0},\cdots ,y_{k}|x_{0},\cdots ,x_{k})p(x_{0},\cdots ,x_{k})}{p(y_{0},\cdots ,y_{k})}}}

where

p ( y 0 , , y k ) = p ( y 0 , , y k | x 0 , , x k ) p ( x 0 , , x k ) d x 0 d x k p ( y 0 , , y k | x 0 , , x k ) = l = 0 k p ( y l | x l ) p ( x 0 , , x k ) = p 0 ( x 0 ) l = 1 k p ( x l | x l 1 ) {\displaystyle {\begin{aligned}p(y_{0},\cdots ,y_{k})&=\int p(y_{0},\cdots ,y_{k}|x_{0},\cdots ,x_{k})p(x_{0},\cdots ,x_{k})dx_{0}\cdots dx_{k}\\p(y_{0},\cdots ,y_{k}|x_{0},\cdots ,x_{k})&=\prod _{l=0}^{k}p(y_{l}|x_{l})\\p(x_{0},\cdots ,x_{k})&=p_{0}(x_{0})\prod _{l=1}^{k}p(x_{l}|x_{l-1})\end{aligned}}}

Particle filters are also an approximation, but with enough particles they can be much more accurate. The nonlinear filtering equation is given by the recursion

p ( x k | y 0 , , y k 1 ) updating p ( x k | y 0 , , y k ) = p ( y k | x k ) p ( x k | y 0 , , y k 1 ) p ( y k | x k ) p ( x k | y 0 , , y k 1 ) d x k prediction p ( x k + 1 | y 0 , , y k ) = p ( x k + 1 | x k ) p ( x k | y 0 , , y k ) d x k {\displaystyle {\begin{aligned}p(x_{k}|y_{0},\cdots ,y_{k-1})&{\stackrel {\text{updating}}{\longrightarrow }}p(x_{k}|y_{0},\cdots ,y_{k})={\frac {p(y_{k}|x_{k})p(x_{k}|y_{0},\cdots ,y_{k-1})}{\int p(y_{k}|x'_{k})p(x'_{k}|y_{0},\cdots ,y_{k-1})dx'_{k}}}\\&{\stackrel {\text{prediction}}{\longrightarrow }}p(x_{k+1}|y_{0},\cdots ,y_{k})=\int p(x_{k+1}|x_{k})p(x_{k}|y_{0},\cdots ,y_{k})dx_{k}\end{aligned}}}

Eq. 1

with the convention p ( x 0 | y 0 , , y k 1 ) = p ( x 0 ) {\displaystyle p(x_{0}|y_{0},\cdots ,y_{k-1})=p(x_{0})} for k = 0. The nonlinear filtering problem consists in computing these conditional distributions sequentially.

Feynman-Kac formulation

Main article: Feynman–Kac formula

We fix a time horizon n and a sequence of observations Y 0 = y 0 , , Y n = y n {\displaystyle Y_{0}=y_{0},\cdots ,Y_{n}=y_{n}} , and for each k = 0, ..., n we set:

G k ( x k ) = p ( y k | x k ) . {\displaystyle G_{k}(x_{k})=p(y_{k}|x_{k}).}

In this notation, for any bounded function F on the set of trajectories of X k {\displaystyle X_{k}} from the origin k = 0 up to time k = n, we have the Feynman-Kac formula

F ( x 0 , , x n ) p ( x 0 , , x n | y 0 , , y n ) d x 0 d x n = F ( x 0 , , x n ) { k = 0 n p ( y k | x k ) } p ( x 0 , , x n ) d x 0 d x n { k = 0 n p ( y k | x k ) } p ( x 0 , , x n ) d x 0 d x n = E ( F ( X 0 , , X n ) k = 0 n G k ( X k ) ) E ( k = 0 n G k ( X k ) ) {\displaystyle {\begin{aligned}\int F(x_{0},\cdots ,x_{n})p(x_{0},\cdots ,x_{n}|y_{0},\cdots ,y_{n})dx_{0}\cdots dx_{n}&={\frac {\int F(x_{0},\cdots ,x_{n})\left\{\prod \limits _{k=0}^{n}p(y_{k}|x_{k})\right\}p(x_{0},\cdots ,x_{n})dx_{0}\cdots dx_{n}}{\int \left\{\prod \limits _{k=0}^{n}p(y_{k}|x_{k})\right\}p(x_{0},\cdots ,x_{n})dx_{0}\cdots dx_{n}}}\\&={\frac {E\left(F(X_{0},\cdots ,X_{n})\prod \limits _{k=0}^{n}G_{k}(X_{k})\right)}{E\left(\prod \limits _{k=0}^{n}G_{k}(X_{k})\right)}}\end{aligned}}}

Feynman-Kac path integration models arise in a variety of scientific disciplines, including in computational physics, biology, information theory and computer sciences. Their interpretations are dependent on the application domain. For instance, if we choose the indicator function G n ( x n ) = 1 A ( x n ) {\displaystyle G_{n}(x_{n})=1_{A}(x_{n})} of some subset of the state space, they represent the conditional distribution of a Markov chain given it stays in a given tube; that is, we have:

E ( F ( X 0 , , X n ) | X 0 A , , X n A ) = E ( F ( X 0 , , X n ) k = 0 n G k ( X k ) ) E ( k = 0 n G k ( X k ) ) {\displaystyle E\left(F(X_{0},\cdots ,X_{n})|X_{0}\in A,\cdots ,X_{n}\in A\right)={\frac {E\left(F(X_{0},\cdots ,X_{n})\prod \limits _{k=0}^{n}G_{k}(X_{k})\right)}{E\left(\prod \limits _{k=0}^{n}G_{k}(X_{k})\right)}}}

and

P ( X 0 A , , X n A ) = E ( k = 0 n G k ( X k ) ) {\displaystyle P\left(X_{0}\in A,\cdots ,X_{n}\in A\right)=E\left(\prod \limits _{k=0}^{n}G_{k}(X_{k})\right)}

as soon as the normalizing constant is strictly positive.

Particle filters

A Genetic type particle algorithm

Initially, such an algorithm starts with N independent random variables ( ξ 0 i ) 1 i N {\displaystyle \left(\xi _{0}^{i}\right)_{1\leqslant i\leqslant N}} with common probability density p ( x 0 ) {\displaystyle p(x_{0})} . The genetic algorithm selection-mutation transitions

ξ k := ( ξ k i ) 1 i N selection ξ ^ k := ( ξ ^ k i ) 1 i N mutation ξ k + 1 := ( ξ k + 1 i ) 1 i N {\displaystyle \xi _{k}:=\left(\xi _{k}^{i}\right)_{1\leqslant i\leqslant N}{\stackrel {\text{selection}}{\longrightarrow }}{\widehat {\xi }}_{k}:=\left({\widehat {\xi }}_{k}^{i}\right)_{1\leqslant i\leqslant N}{\stackrel {\text{mutation}}{\longrightarrow }}\xi _{k+1}:=\left(\xi _{k+1}^{i}\right)_{1\leqslant i\leqslant N}}

mimic/approximate the updating-prediction transitions of the optimal filter evolution (Eq. 1):

  • During the selection-updating transition we sample N (conditionally) independent random variables ξ ^ k := ( ξ ^ k i ) 1 i N {\displaystyle {\widehat {\xi }}_{k}:=\left({\widehat {\xi }}_{k}^{i}\right)_{1\leqslant i\leqslant N}} with common (conditional) distribution
i = 1 N p ( y k | ξ k i ) j = 1 N p ( y k | ξ k j ) δ ξ k i ( d x k ) {\displaystyle \sum _{i=1}^{N}{\frac {p(y_{k}|\xi _{k}^{i})}{\sum _{j=1}^{N}p(y_{k}|\xi _{k}^{j})}}\delta _{\xi _{k}^{i}}(dx_{k})}

where δ a {\displaystyle \delta _{a}} stands for the Dirac measure at a given state a.

  • During the mutation-prediction transition, from each selected particle ξ ^ k i {\displaystyle {\widehat {\xi }}_{k}^{i}} we sample independently a transition
ξ ^ k i ξ k + 1 i p ( x k + 1 | ξ ^ k i ) , i = 1 , , N . {\displaystyle {\widehat {\xi }}_{k}^{i}\longrightarrow \xi _{k+1}^{i}\sim p(x_{k+1}|{\widehat {\xi }}_{k}^{i}),\qquad i=1,\cdots ,N.}

In the above displayed formulae p ( y k | ξ k i ) {\displaystyle p(y_{k}|\xi _{k}^{i})} stands for the likelihood function x k p ( y k | x k ) {\displaystyle x_{k}\mapsto p(y_{k}|x_{k})} evaluated at x k = ξ k i {\displaystyle x_{k}=\xi _{k}^{i}} , and p ( x k + 1 | ξ ^ k i ) {\displaystyle p(x_{k+1}|{\widehat {\xi }}_{k}^{i})} stands for the conditional density p ( x k + 1 | x k ) {\displaystyle p(x_{k+1}|x_{k})} evaluated at x k = ξ ^ k i {\displaystyle x_{k}={\widehat {\xi }}_{k}^{i}} .

At each time k, we have the particle approximations

p ^ ( d x k | y 0 , , y k ) := 1 N i = 1 N δ ξ ^ k i ( d x k ) N p ( d x k | y 0 , , y k ) N i = 1 N p ( y k | ξ k i ) i = 1 N p ( y k | ξ k j ) δ ξ k i ( d x k ) {\displaystyle {\widehat {p}}(dx_{k}|y_{0},\cdots ,y_{k}):={\frac {1}{N}}\sum _{i=1}^{N}\delta _{{\widehat {\xi }}_{k}^{i}}(dx_{k})\approx _{N\uparrow \infty }p(dx_{k}|y_{0},\cdots ,y_{k})\approx _{N\uparrow \infty }\sum _{i=1}^{N}{\frac {p(y_{k}|\xi _{k}^{i})}{\sum _{i=1}^{N}p(y_{k}|\xi _{k}^{j})}}\delta _{\xi _{k}^{i}}(dx_{k})}

and

p ^ ( d x k | y 0 , , y k 1 ) := 1 N i = 1 N δ ξ k i ( d x k ) N p ( d x k | y 0 , , y k 1 ) {\displaystyle {\widehat {p}}(dx_{k}|y_{0},\cdots ,y_{k-1}):={\frac {1}{N}}\sum _{i=1}^{N}\delta _{\xi _{k}^{i}}(dx_{k})\approx _{N\uparrow \infty }p(dx_{k}|y_{0},\cdots ,y_{k-1})}

In Genetic algorithms and Evolutionary computing community, the mutation-selection Markov chain described above is often called the genetic algorithm with proportional selection. Several branching variants, including with random population sizes have also been proposed in the articles.

Monte Carlo principles

Particle methods, like all sampling-based approaches (e.g., Markov Chain Monte Carlo), generate a set of samples that approximate the filtering density

p ( x k | y 0 , , y k ) . {\displaystyle p(x_{k}|y_{0},\cdots ,y_{k}).}

For example, we may have N samples from the approximate posterior distribution of X k {\displaystyle X_{k}} , where the samples are labeled with superscripts as:

ξ ^ k 1 , , ξ ^ k N . {\displaystyle {\widehat {\xi }}_{k}^{1},\cdots ,{\widehat {\xi }}_{k}^{N}.}

Then, expectations with respect to the filtering distribution are approximated by

f ( x k ) p ( x k | y 0 , , y k ) d x k N 1 N i = 1 N f ( ξ ^ k i ) = f ( x k ) p ^ ( d x k | y 0 , , y k ) {\displaystyle \int f(x_{k})p(x_{k}|y_{0},\cdots ,y_{k})\,dx_{k}\approx _{N\uparrow \infty }{\frac {1}{N}}\sum _{i=1}^{N}f\left({\widehat {\xi }}_{k}^{i}\right)=\int f(x_{k}){\widehat {p}}(dx_{k}|y_{0},\cdots ,y_{k})} Eq. 2

with

p ^ ( d x k | y 0 , , y k ) = 1 N i = 1 N δ ξ ^ k i ( d x k ) {\displaystyle {\widehat {p}}(dx_{k}|y_{0},\cdots ,y_{k})={\frac {1}{N}}\sum _{i=1}^{N}\delta _{{\widehat {\xi }}_{k}^{i}}(dx_{k})}

where δ a {\displaystyle \delta _{a}} stands for the Dirac measure at a given state a. The function f, in the usual way for Monte Carlo, can give all the moments etc. of the distribution up to some approximation error. When the approximation equation (Eq. 2) is satisfied for any bounded function f we write

p ( d x k | y 0 , , y k ) := p ( x k | y 0 , , y k ) d x k N p ^ ( d x k | y 0 , , y k ) = 1 N i = 1 N δ ξ ^ k i ( d x k ) {\displaystyle p(dx_{k}|y_{0},\cdots ,y_{k}):=p(x_{k}|y_{0},\cdots ,y_{k})dx_{k}\approx _{N\uparrow \infty }{\widehat {p}}(dx_{k}|y_{0},\cdots ,y_{k})={\frac {1}{N}}\sum _{i=1}^{N}\delta _{{\widehat {\xi }}_{k}^{i}}(dx_{k})}

Particle filters can be interpreted as a genetic type particle algorithm evolving with mutation and selection transitions. We can keep track of the ancestral lines

( ξ ^ 0 , k i , ξ ^ 1 , k i , , ξ ^ k 1 , k i , ξ ^ k , k i ) {\displaystyle \left({\widehat {\xi }}_{0,k}^{i},{\widehat {\xi }}_{1,k}^{i},\cdots ,{\widehat {\xi }}_{k-1,k}^{i},{\widehat {\xi }}_{k,k}^{i}\right)}

of the particles i = 1 , , N {\displaystyle i=1,\cdots ,N} . The random states ξ ^ l , k i {\displaystyle {\widehat {\xi }}_{l,k}^{i}} , with the lower indices l=0,...,k, stands for the ancestor of the individual ξ ^ k , k i = ξ ^ k i {\displaystyle {\widehat {\xi }}_{k,k}^{i}={\widehat {\xi }}_{k}^{i}} at level l=0,...,k. In this situation, we have the approximation formula

F ( x 0 , , x k ) p ( x 0 , , x k | y 0 , , y k ) d x 0 d x k N 1 N i = 1 N F ( ξ ^ 0 , k i , ξ ^ 1 , k i , , ξ ^ k , k i ) = F ( x 0 , , x k ) p ^ ( d ( x 0 , , x k ) | y 0 , , y k ) {\displaystyle {\begin{aligned}\int F(x_{0},\cdots ,x_{k})p(x_{0},\cdots ,x_{k}|y_{0},\cdots ,y_{k})\,dx_{0}\cdots dx_{k}&\approx _{N\uparrow \infty }{\frac {1}{N}}\sum _{i=1}^{N}F\left({\widehat {\xi }}_{0,k}^{i},{\widehat {\xi }}_{1,k}^{i},\cdots ,{\widehat {\xi }}_{k,k}^{i}\right)\\&=\int F(x_{0},\cdots ,x_{k}){\widehat {p}}(d(x_{0},\cdots ,x_{k})|y_{0},\cdots ,y_{k})\end{aligned}}} Eq. 3

with the empirical measure

p ^ ( d ( x 0 , , x k ) | y 0 , , y k ) := 1 N i = 1 N δ ( ξ ^ 0 , k i , ξ ^ 1 , k i , , ξ ^ k , k i ) ( d ( x 0 , , x k ) ) {\displaystyle {\widehat {p}}(d(x_{0},\cdots ,x_{k})|y_{0},\cdots ,y_{k}):={\frac {1}{N}}\sum _{i=1}^{N}\delta _{\left({\widehat {\xi }}_{0,k}^{i},{\widehat {\xi }}_{1,k}^{i},\cdots ,{\widehat {\xi }}_{k,k}^{i}\right)}(d(x_{0},\cdots ,x_{k}))}

Here F stands for any founded function on the path space of the signal. In a more synthetic form (Eq. 3) is equivalent to

p ( d ( x 0 , , x k ) | y 0 , , y k ) := p ( x 0 , , x k | y 0 , , y k ) d x 0 d x k N p ^ ( d ( x 0 , , x k ) | y 0 , , y k ) := 1 N i = 1 N δ ( ξ ^ 0 , k i , , ξ ^ k , k i ) ( d ( x 0 , , x k ) ) {\displaystyle {\begin{aligned}p(d(x_{0},\cdots ,x_{k})|y_{0},\cdots ,y_{k})&:=p(x_{0},\cdots ,x_{k}|y_{0},\cdots ,y_{k})\,dx_{0}\cdots dx_{k}\\&\approx _{N\uparrow \infty }{\widehat {p}}(d(x_{0},\cdots ,x_{k})|y_{0},\cdots ,y_{k})\\&:={\frac {1}{N}}\sum _{i=1}^{N}\delta _{\left({\widehat {\xi }}_{0,k}^{i},\cdots ,{\widehat {\xi }}_{k,k}^{i}\right)}(d(x_{0},\cdots ,x_{k}))\end{aligned}}}

Particle filters can be interpreted in many different ways. From the probabilistic point of view they coincide with a mean-field particle interpretation of the nonlinear filtering equation. The updating-prediction transitions of the optimal filter evolution can also be interpreted as the classical genetic type selection-mutation transitions of individuals. The sequential importance resampling technique provides another interpretation of the filtering transitions coupling importance sampling with the bootstrap resampling step. Last, but not least, particle filters can be seen as an acceptance-rejection methodology equipped with a recycling mechanism.

Mean-field particle simulation

This section may be too technical for most readers to understand. Please help improve it to make it understandable to non-experts, without removing the technical details. (June 2017) (Learn how and when to remove this message)

The general probabilistic principle

The nonlinear filtering evolution can be interpreted as a dynamical system in the set of probability measures of the form η n + 1 = Φ n + 1 ( η n ) {\displaystyle \eta _{n+1}=\Phi _{n+1}\left(\eta _{n}\right)} where Φ n + 1 {\displaystyle \Phi _{n+1}} stands for some mapping from the set of probability distribution into itself. For instance, the evolution of the one-step optimal predictor η n ( d x n ) = p ( x n | y 0 , , y n 1 ) d x n {\displaystyle \eta _{n}(dx_{n})=p(x_{n}|y_{0},\cdots ,y_{n-1})dx_{n}}

satisfies a nonlinear evolution starting with the probability distribution η 0 ( d x 0 ) = p ( x 0 ) d x 0 {\displaystyle \eta _{0}(dx_{0})=p(x_{0})dx_{0}} . One of the simplest ways to approximate these probability measures is to start with N independent random variables ( ξ 0 i ) 1 i N {\displaystyle \left(\xi _{0}^{i}\right)_{1\leqslant i\leqslant N}} with common probability distribution η 0 ( d x 0 ) = p ( x 0 ) d x 0 {\displaystyle \eta _{0}(dx_{0})=p(x_{0})dx_{0}} . Suppose we have defined a sequence of N random variables ( ξ n i ) 1 i N {\displaystyle \left(\xi _{n}^{i}\right)_{1\leqslant i\leqslant N}} such that

1 N i = 1 N δ ξ n i ( d x n ) N η n ( d x n ) {\displaystyle {\frac {1}{N}}\sum _{i=1}^{N}\delta _{\xi _{n}^{i}}(dx_{n})\approx _{N\uparrow \infty }\eta _{n}(dx_{n})}

At the next step we sample N (conditionally) independent random variables ξ n + 1 := ( ξ n + 1 i ) 1 i N {\displaystyle \xi _{n+1}:=\left(\xi _{n+1}^{i}\right)_{1\leqslant i\leqslant N}} with common law .

Φ n + 1 ( 1 N i = 1 N δ ξ n i ) N Φ n + 1 ( η n ) = η n + 1 {\displaystyle \Phi _{n+1}\left({\frac {1}{N}}\sum _{i=1}^{N}\delta _{\xi _{n}^{i}}\right)\approx _{N\uparrow \infty }\Phi _{n+1}\left(\eta _{n}\right)=\eta _{n+1}}

A particle interpretation of the filtering equation

We illustrate this mean-field particle principle in the context of the evolution of the one step optimal predictors

p ( x k | y 0 , , y k 1 ) d x k p ( x k + 1 | y 0 , , y k ) = p ( x k + 1 | x k ) p ( y k | x k ) p ( x k | y 0 , , y k 1 ) d x k p ( y k | x k ) p ( x k | y 0 , , y k 1 ) d x k {\displaystyle p(x_{k}|y_{0},\cdots ,y_{k-1})dx_{k}\to p(x_{k+1}|y_{0},\cdots ,y_{k})=\int p(x_{k+1}|x'_{k}){\frac {p(y_{k}|x_{k}')p(x'_{k}|y_{0},\cdots ,y_{k-1})dx'_{k}}{\int p(y_{k}|x''_{k})p(x''_{k}|y_{0},\cdots ,y_{k-1})dx''_{k}}}}

Eq. 4

For k = 0 we use the convention p ( x 0 | y 0 , , y 1 ) := p ( x 0 ) {\displaystyle p(x_{0}|y_{0},\cdots ,y_{-1}):=p(x_{0})} .

By the law of large numbers, we have

p ^ ( d x 0 ) = 1 N i = 1 N δ ξ 0 i ( d x 0 ) N p ( x 0 ) d x 0 {\displaystyle {\widehat {p}}(dx_{0})={\frac {1}{N}}\sum _{i=1}^{N}\delta _{\xi _{0}^{i}}(dx_{0})\approx _{N\uparrow \infty }p(x_{0})dx_{0}}

in the sense that

f ( x 0 ) p ^ ( d x 0 ) = 1 N i = 1 N f ( ξ 0 i ) N f ( x 0 ) p ( d x 0 ) d x 0 {\displaystyle \int f(x_{0}){\widehat {p}}(dx_{0})={\frac {1}{N}}\sum _{i=1}^{N}f(\xi _{0}^{i})\approx _{N\uparrow \infty }\int f(x_{0})p(dx_{0})dx_{0}}

for any bounded function f {\displaystyle f} . We further assume that we have constructed a sequence of particles ( ξ k i ) 1 i N {\displaystyle \left(\xi _{k}^{i}\right)_{1\leqslant i\leqslant N}} at some rank k such that

p ^ ( d x k | y 0 , , y k 1 ) := 1 N i = 1 N δ ξ k i ( d x k ) N   p ( x k   |   y 0 , , y k 1 ) d x k {\displaystyle {\widehat {p}}(dx_{k}|y_{0},\cdots ,y_{k-1}):={\frac {1}{N}}\sum _{i=1}^{N}\delta _{\xi _{k}^{i}}(dx_{k})\approx _{N\uparrow \infty }~p(x_{k}~|~y_{0},\cdots ,y_{k-1})dx_{k}}

in the sense that for any bounded function f {\displaystyle f} we have

f ( x k ) p ^ ( d x k | y 0 , , y k 1 ) = 1 N i = 1 N f ( ξ k i ) N f ( x k ) p ( d x k | y 0 , , y k 1 ) d x k {\displaystyle \int f(x_{k}){\widehat {p}}(dx_{k}|y_{0},\cdots ,y_{k-1})={\frac {1}{N}}\sum _{i=1}^{N}f(\xi _{k}^{i})\approx _{N\uparrow \infty }\int f(x_{k})p(dx_{k}|y_{0},\cdots ,y_{k-1})dx_{k}}

In this situation, replacing p ( x k | y 0 , , y k 1 ) d x k {\displaystyle p(x_{k}|y_{0},\cdots ,y_{k-1})dx_{k}} by the empirical measure p ^ ( d x k | y 0 , , y k 1 ) {\displaystyle {\widehat {p}}(dx_{k}|y_{0},\cdots ,y_{k-1})} in the evolution equation of the one-step optimal filter stated in (Eq. 4) we find that

p ( x k + 1 | y 0 , , y k ) N p ( x k + 1 | x k ) p ( y k | x k ) p ^ ( d x k | y 0 , , y k 1 ) p ( y k | x k ) p ^ ( d x k | y 0 , , y k 1 ) {\displaystyle p(x_{k+1}|y_{0},\cdots ,y_{k})\approx _{N\uparrow \infty }\int p(x_{k+1}|x'_{k}){\frac {p(y_{k}|x_{k}'){\widehat {p}}(dx'_{k}|y_{0},\cdots ,y_{k-1})}{\int p(y_{k}|x''_{k}){\widehat {p}}(dx''_{k}|y_{0},\cdots ,y_{k-1})}}}

Notice that the right hand side in the above formula is a weighted probability mixture

p ( x k + 1 | x k ) p ( y k | x k ) p ^ ( d x k | y 0 , , y k 1 ) p ( y k | x k ) p ^ ( d x k | y 0 , , y k 1 ) = i = 1 N p ( y k | ξ k i ) i = 1 N p ( y k | ξ k j ) p ( x k + 1 | ξ k i ) =: q ^ ( x k + 1 | y 0 , , y k ) {\displaystyle \int p(x_{k+1}|x'_{k}){\frac {p(y_{k}|x_{k}'){\widehat {p}}(dx'_{k}|y_{0},\cdots ,y_{k-1})}{\int p(y_{k}|x''_{k}){\widehat {p}}(dx''_{k}|y_{0},\cdots ,y_{k-1})}}=\sum _{i=1}^{N}{\frac {p(y_{k}|\xi _{k}^{i})}{\sum _{i=1}^{N}p(y_{k}|\xi _{k}^{j})}}p(x_{k+1}|\xi _{k}^{i})=:{\widehat {q}}(x_{k+1}|y_{0},\cdots ,y_{k})}

where p ( y k | ξ k i ) {\displaystyle p(y_{k}|\xi _{k}^{i})} stands for the density p ( y k | x k ) {\displaystyle p(y_{k}|x_{k})} evaluated at x k = ξ k i {\displaystyle x_{k}=\xi _{k}^{i}} , and p ( x k + 1 | ξ k i ) {\displaystyle p(x_{k+1}|\xi _{k}^{i})} stands for the density p ( x k + 1 | x k ) {\displaystyle p(x_{k+1}|x_{k})} evaluated at x k = ξ k i {\displaystyle x_{k}=\xi _{k}^{i}} for i = 1 , , N . {\displaystyle i=1,\cdots ,N.}

Then, we sample N independent random variable ( ξ k + 1 i ) 1 i N {\displaystyle \left(\xi _{k+1}^{i}\right)_{1\leqslant i\leqslant N}} with common probability density q ^ ( x k + 1 | y 0 , , y k ) {\displaystyle {\widehat {q}}(x_{k+1}|y_{0},\cdots ,y_{k})} so that

p ^ ( d x k + 1 | y 0 , , y k ) := 1 N i = 1 N δ ξ k + 1 i ( d x k + 1 ) N q ^ ( x k + 1 | y 0 , , y k ) d x k + 1 N p ( x k + 1 | y 0 , , y k ) d x k + 1 {\displaystyle {\widehat {p}}(dx_{k+1}|y_{0},\cdots ,y_{k}):={\frac {1}{N}}\sum _{i=1}^{N}\delta _{\xi _{k+1}^{i}}(dx_{k+1})\approx _{N\uparrow \infty }{\widehat {q}}(x_{k+1}|y_{0},\cdots ,y_{k})dx_{k+1}\approx _{N\uparrow \infty }p(x_{k+1}|y_{0},\cdots ,y_{k})dx_{k+1}}

Iterating this procedure, we design a Markov chain such that

p ^ ( d x k | y 0 , , y k 1 ) := 1 N i = 1 N δ ξ k i ( d x k ) N p ( d x k | y 0 , , y k 1 ) := p ( x k | y 0 , , y k 1 ) d x k {\displaystyle {\widehat {p}}(dx_{k}|y_{0},\cdots ,y_{k-1}):={\frac {1}{N}}\sum _{i=1}^{N}\delta _{\xi _{k}^{i}}(dx_{k})\approx _{N\uparrow \infty }p(dx_{k}|y_{0},\cdots ,y_{k-1}):=p(x_{k}|y_{0},\cdots ,y_{k-1})dx_{k}}

Notice that the optimal filter is approximated at each time step k using the Bayes' formulae

p ( d x k | y 0 , , y k ) N p ( y k | x k ) p ^ ( d x k | y 0 , , y k 1 ) p ( y k | x k ) p ^ ( d x k | y 0 , , y k 1 ) = i = 1 N p ( y k | ξ k i ) j = 1 N p ( y k | ξ k j )   δ ξ k i ( d x k ) {\displaystyle p(dx_{k}|y_{0},\cdots ,y_{k})\approx _{N\uparrow \infty }{\frac {p(y_{k}|x_{k}){\widehat {p}}(dx_{k}|y_{0},\cdots ,y_{k-1})}{\int p(y_{k}|x'_{k}){\widehat {p}}(dx'_{k}|y_{0},\cdots ,y_{k-1})}}=\sum _{i=1}^{N}{\frac {p(y_{k}|\xi _{k}^{i})}{\sum _{j=1}^{N}p(y_{k}|\xi _{k}^{j})}}~\delta _{\xi _{k}^{i}}(dx_{k})}

The terminology "mean-field approximation" comes from the fact that we replace at each time step the probability measure p ( d x k | y 0 , , y k 1 ) {\displaystyle p(dx_{k}|y_{0},\cdots ,y_{k-1})} by the empirical approximation p ^ ( d x k | y 0 , , y k 1 ) {\displaystyle {\widehat {p}}(dx_{k}|y_{0},\cdots ,y_{k-1})} . The mean-field particle approximation of the filtering problem is far from being unique. Several strategies are developed in the books.

Some convergence results

The analysis of the convergence of particle filters was started in 1996 and in 2000 in the book and the series of articles. More recent developments can be found in the books, When the filtering equation is stable (in the sense that it corrects any erroneous initial condition), the bias and the variance of the particle particle estimates

I k ( f ) := f ( x k ) p ( d x k | y 0 , , y k 1 ) N I ^ k ( f ) := f ( x k ) p ^ ( d x k | y 0 , , y k 1 ) {\displaystyle I_{k}(f):=\int f(x_{k})p(dx_{k}|y_{0},\cdots ,y_{k-1})\approx _{N\uparrow \infty }{\widehat {I}}_{k}(f):=\int f(x_{k}){\widehat {p}}(dx_{k}|y_{0},\cdots ,y_{k-1})}

are controlled by the non asymptotic uniform estimates

sup k 0 | E ( I ^ k ( f ) ) I k ( f ) | c 1 N {\displaystyle \sup _{k\geqslant 0}\left\vert E\left({\widehat {I}}_{k}(f)\right)-I_{k}(f)\right\vert \leqslant {\frac {c_{1}}{N}}}
sup k 0 E ( [ I ^ k ( f ) I k ( f ) ] 2 ) c 2 N {\displaystyle \sup _{k\geqslant 0}E\left(\left^{2}\right)\leqslant {\frac {c_{2}}{N}}}

for any function f bounded by 1, and for some finite constants c 1 , c 2 . {\displaystyle c_{1},c_{2}.} In addition, for any x 0 {\displaystyle x\geqslant 0} :

P ( | I ^ k ( f ) I k ( f ) | c 1 x N + c 2 x N sup 0 k n | I ^ k ( f ) I k ( f ) | c x log ( n ) N ) > 1 e x {\displaystyle \mathbf {P} \left(\left|{\widehat {I}}_{k}(f)-I_{k}(f)\right|\leqslant c_{1}{\frac {x}{N}}+c_{2}{\sqrt {\frac {x}{N}}}\land \sup _{0\leqslant k\leqslant n}\left|{\widehat {I}}_{k}(f)-I_{k}(f)\right|\leqslant c{\sqrt {\frac {x\log(n)}{N}}}\right)>1-e^{-x}}

for some finite constants c 1 , c 2 {\displaystyle c_{1},c_{2}} related to the asymptotic bias and variance of the particle estimate, and some finite constant c. The same results are satisfied if we replace the one step optimal predictor by the optimal filter approximation.

Genealogical trees and Unbiasedness properties

This section may be too technical for most readers to understand. Please help improve it to make it understandable to non-experts, without removing the technical details. (June 2017) (Learn how and when to remove this message)

Genealogical tree based particle smoothing

Tracing back in time the ancestral lines

( ξ ^ 0 , k i , ξ ^ 1 , k i , , ξ ^ k 1 , k i , ξ ^ k , k i ) , ( ξ 0 , k i , ξ 1 , k i , , ξ k 1 , k i , ξ k , k i ) {\displaystyle \left({\widehat {\xi }}_{0,k}^{i},{\widehat {\xi }}_{1,k}^{i},\cdots ,{\widehat {\xi }}_{k-1,k}^{i},{\widehat {\xi }}_{k,k}^{i}\right),\quad \left(\xi _{0,k}^{i},\xi _{1,k}^{i},\cdots ,\xi _{k-1,k}^{i},\xi _{k,k}^{i}\right)}

of the individuals ξ ^ k i ( = ξ ^ k , k i ) {\displaystyle {\widehat {\xi }}_{k}^{i}\left(={\widehat {\xi }}_{k,k}^{i}\right)} and ξ k i ( = ξ k , k i ) {\displaystyle \xi _{k}^{i}\left(={\xi }_{k,k}^{i}\right)} at every time step k, we also have the particle approximations

p ^ ( d ( x 0 , , x k ) | y 0 , , y k ) := 1 N i = 1 N δ ( ξ ^ 0 , k i , , ξ ^ 0 , k i ) ( d ( x 0 , , x k ) ) N p ( d ( x 0 , , x k ) | y 0 , , y k ) N i = 1 N p ( y k | ξ k , k i ) j = 1 N p ( y k | ξ k , k j ) δ ( ξ 0 , k i , , ξ 0 , k i ) ( d ( x 0 , , x k ) )   p ^ ( d ( x 0 , , x k ) | y 0 , , y k 1 ) := 1 N i = 1 N δ ( ξ 0 , k i , , ξ k , k i ) ( d ( x 0 , , x k ) ) N p ( d ( x 0 , , x k ) | y 0 , , y k 1 ) := p ( x 0 , , x k | y 0 , , y k 1 ) d x 0 , , d x k {\displaystyle {\begin{aligned}{\widehat {p}}(d(x_{0},\cdots ,x_{k})|y_{0},\cdots ,y_{k})&:={\frac {1}{N}}\sum _{i=1}^{N}\delta _{\left({\widehat {\xi }}_{0,k}^{i},\cdots ,{\widehat {\xi }}_{0,k}^{i}\right)}(d(x_{0},\cdots ,x_{k}))\\&\approx _{N\uparrow \infty }p(d(x_{0},\cdots ,x_{k})|y_{0},\cdots ,y_{k})\\&\approx _{N\uparrow \infty }\sum _{i=1}^{N}{\frac {p(y_{k}|\xi _{k,k}^{i})}{\sum _{j=1}^{N}p(y_{k}|\xi _{k,k}^{j})}}\delta _{\left(\xi _{0,k}^{i},\cdots ,\xi _{0,k}^{i}\right)}(d(x_{0},\cdots ,x_{k}))\\&\ \\{\widehat {p}}(d(x_{0},\cdots ,x_{k})|y_{0},\cdots ,y_{k-1})&:={\frac {1}{N}}\sum _{i=1}^{N}\delta _{\left(\xi _{0,k}^{i},\cdots ,\xi _{k,k}^{i}\right)}(d(x_{0},\cdots ,x_{k}))\\&\approx _{N\uparrow \infty }p(d(x_{0},\cdots ,x_{k})|y_{0},\cdots ,y_{k-1})\\&:=p(x_{0},\cdots ,x_{k}|y_{0},\cdots ,y_{k-1})dx_{0},\cdots ,dx_{k}\end{aligned}}}

These empirical approximations are equivalent to the particle integral approximations

F ( x 0 , , x n ) p ^ ( d ( x 0 , , x k ) | y 0 , , y k ) := 1 N i = 1 N F ( ξ ^ 0 , k i , , ξ ^ 0 , k i ) N F ( x 0 , , x n ) p ( d ( x 0 , , x k ) | y 0 , , y k ) N i = 1 N p ( y k | ξ k , k i ) j = 1 N p ( y k | ξ k , k j ) F ( ξ 0 , k i , , ξ k , k i )   F ( x 0 , , x n ) p ^ ( d ( x 0 , , x k ) | y 0 , , y k 1 ) := 1 N i = 1 N F ( ξ 0 , k i , , ξ k , k i ) N F ( x 0 , , x n ) p ( d ( x 0 , , x k ) | y 0 , , y k 1 ) {\displaystyle {\begin{aligned}\int F(x_{0},\cdots ,x_{n}){\widehat {p}}(d(x_{0},\cdots ,x_{k})|y_{0},\cdots ,y_{k})&:={\frac {1}{N}}\sum _{i=1}^{N}F\left({\widehat {\xi }}_{0,k}^{i},\cdots ,{\widehat {\xi }}_{0,k}^{i}\right)\\&\approx _{N\uparrow \infty }\int F(x_{0},\cdots ,x_{n})p(d(x_{0},\cdots ,x_{k})|y_{0},\cdots ,y_{k})\\&\approx _{N\uparrow \infty }\sum _{i=1}^{N}{\frac {p(y_{k}|\xi _{k,k}^{i})}{\sum _{j=1}^{N}p(y_{k}|\xi _{k,k}^{j})}}F\left(\xi _{0,k}^{i},\cdots ,\xi _{k,k}^{i}\right)\\&\ \\\int F(x_{0},\cdots ,x_{n}){\widehat {p}}(d(x_{0},\cdots ,x_{k})|y_{0},\cdots ,y_{k-1})&:={\frac {1}{N}}\sum _{i=1}^{N}F\left(\xi _{0,k}^{i},\cdots ,\xi _{k,k}^{i}\right)\\&\approx _{N\uparrow \infty }\int F(x_{0},\cdots ,x_{n})p(d(x_{0},\cdots ,x_{k})|y_{0},\cdots ,y_{k-1})\end{aligned}}}

for any bounded function F on the random trajectories of the signal. As shown in the evolution of the genealogical tree coincides with a mean-field particle interpretation of the evolution equations associated with the posterior densities of the signal trajectories. For more details on these path space models, we refer to the books.

Unbiased particle estimates of likelihood functions

We use the product formula

p ( y 0 , , y n ) = k = 0 n p ( y k | y 0 , , y k 1 ) {\displaystyle p(y_{0},\cdots ,y_{n})=\prod _{k=0}^{n}p(y_{k}|y_{0},\cdots ,y_{k-1})}

with

p ( y k | y 0 , , y k 1 ) = p ( y k | x k ) p ( d x k | y 0 , , y k 1 ) {\displaystyle p(y_{k}|y_{0},\cdots ,y_{k-1})=\int p(y_{k}|x_{k})p(dx_{k}|y_{0},\cdots ,y_{k-1})}

and the conventions p ( y 0 | y 0 , , y 1 ) = p ( y 0 ) {\displaystyle p(y_{0}|y_{0},\cdots ,y_{-1})=p(y_{0})} and p ( x 0 | y 0 , , y 1 ) = p ( x 0 ) , {\displaystyle p(x_{0}|y_{0},\cdots ,y_{-1})=p(x_{0}),} for k = 0. Replacing p ( x k | y 0 , , y k 1 ) d x k {\displaystyle p(x_{k}|y_{0},\cdots ,y_{k-1})dx_{k}} by the empirical approximation

p ^ ( d x k | y 0 , , y k 1 ) := 1 N i = 1 N δ ξ k i ( d x k ) N p ( d x k | y 0 , , y k 1 ) {\displaystyle {\widehat {p}}(dx_{k}|y_{0},\cdots ,y_{k-1}):={\frac {1}{N}}\sum _{i=1}^{N}\delta _{\xi _{k}^{i}}(dx_{k})\approx _{N\uparrow \infty }p(dx_{k}|y_{0},\cdots ,y_{k-1})}

in the above displayed formula, we design the following unbiased particle approximation of the likelihood function

p ( y 0 , , y n ) N p ^ ( y 0 , , y n ) = k = 0 n p ^ ( y k | y 0 , , y k 1 ) {\displaystyle p(y_{0},\cdots ,y_{n})\approx _{N\uparrow \infty }{\widehat {p}}(y_{0},\cdots ,y_{n})=\prod _{k=0}^{n}{\widehat {p}}(y_{k}|y_{0},\cdots ,y_{k-1})}

with

p ^ ( y k | y 0 , , y k 1 ) = p ( y k | x k ) p ^ ( d x k | y 0 , , y k 1 ) = 1 N i = 1 N p ( y k | ξ k i ) {\displaystyle {\widehat {p}}(y_{k}|y_{0},\cdots ,y_{k-1})=\int p(y_{k}|x_{k}){\widehat {p}}(dx_{k}|y_{0},\cdots ,y_{k-1})={\frac {1}{N}}\sum _{i=1}^{N}p(y_{k}|\xi _{k}^{i})}

where p ( y k | ξ k i ) {\displaystyle p(y_{k}|\xi _{k}^{i})} stands for the density p ( y k | x k ) {\displaystyle p(y_{k}|x_{k})} evaluated at x k = ξ k i {\displaystyle x_{k}=\xi _{k}^{i}} . The design of this particle estimate and the unbiasedness property has been proved in 1996 in the article. Refined variance estimates can be found in and.

Backward particle smoothers

Using Bayes' rule, we have the formula

p ( x 0 , , x n | y 0 , , y n 1 ) = p ( x n | y 0 , , y n 1 ) p ( x n 1 | x n , y 0 , , y n 1 ) p ( x 1 | x 2 , y 0 , y 1 ) p ( x 0 | x 1 , y 0 ) {\displaystyle p(x_{0},\cdots ,x_{n}|y_{0},\cdots ,y_{n-1})=p(x_{n}|y_{0},\cdots ,y_{n-1})p(x_{n-1}|x_{n},y_{0},\cdots ,y_{n-1})\cdots p(x_{1}|x_{2},y_{0},y_{1})p(x_{0}|x_{1},y_{0})}

Notice that

p ( x k 1 | x k , ( y 0 , , y k 1 ) ) p ( x k | x k 1 ) p ( x k 1 | ( y 0 , , y k 1 ) ) p ( x k 1 | ( y 0 , , y k 1 ) p ( y k 1 | x k 1 ) p ( x k 1 | ( y 0 , , y k 2 ) {\displaystyle {\begin{aligned}p(x_{k-1}|x_{k},(y_{0},\cdots ,y_{k-1}))&\propto p(x_{k}|x_{k-1})p(x_{k-1}|(y_{0},\cdots ,y_{k-1}))\\p(x_{k-1}|(y_{0},\cdots ,y_{k-1})&\propto p(y_{k-1}|x_{k-1})p(x_{k-1}|(y_{0},\cdots ,y_{k-2})\end{aligned}}}

This implies that

p ( x k 1 | x k , ( y 0 , , y k 1 ) ) = p ( y k 1 | x k 1 ) p ( x k | x k 1 ) p ( x k 1 | y 0 , , y k 2 ) p ( y k 1 | x k 1 ) p ( x k | x k 1 ) p ( x k 1 | y 0 , , y k 2 ) d x k 1 {\displaystyle p(x_{k-1}|x_{k},(y_{0},\cdots ,y_{k-1}))={\frac {p(y_{k-1}|x_{k-1})p(x_{k}|x_{k-1})p(x_{k-1}|y_{0},\cdots ,y_{k-2})}{\int p(y_{k-1}|x'_{k-1})p(x_{k}|x'_{k-1})p(x'_{k-1}|y_{0},\cdots ,y_{k-2})dx'_{k-1}}}}

Replacing the one-step optimal predictors p ( x k 1 | ( y 0 , , y k 2 ) ) d x k 1 {\displaystyle p(x_{k-1}|(y_{0},\cdots ,y_{k-2}))dx_{k-1}} by the particle empirical measures

p ^ ( d x k 1 | ( y 0 , , y k 2 ) ) = 1 N i = 1 N δ ξ k 1 i ( d x k 1 ) ( N p ( d x k 1 | ( y 0 , , y k 2 ) ) := p ( x k 1 | ( y 0 , , y k 2 ) ) d x k 1 ) {\displaystyle {\widehat {p}}(dx_{k-1}|(y_{0},\cdots ,y_{k-2}))={\frac {1}{N}}\sum _{i=1}^{N}\delta _{\xi _{k-1}^{i}}(dx_{k-1})\left(\approx _{N\uparrow \infty }p(dx_{k-1}|(y_{0},\cdots ,y_{k-2})):={p}(x_{k-1}|(y_{0},\cdots ,y_{k-2}))dx_{k-1}\right)}

we find that

p ( d x k 1 | x k , ( y 0 , , y k 1 ) ) N p ^ ( d x k 1 | x k , ( y 0 , , y k 1 ) ) := p ( y k 1 | x k 1 ) p ( x k | x k 1 ) p ^ ( d x k 1 | y 0 , , y k 2 ) p ( y k 1 | x k 1 )   p ( x k | x k 1 ) p ^ ( d x k 1 | y 0 , , y k 2 ) = i = 1 N p ( y k 1 | ξ k 1 i ) p ( x k | ξ k 1 i ) j = 1 N p ( y k 1 | ξ k 1 j ) p ( x k | ξ k 1 j ) δ ξ k 1 i ( d x k 1 ) {\displaystyle {\begin{aligned}p(dx_{k-1}|x_{k},(y_{0},\cdots ,y_{k-1}))&\approx _{N\uparrow \infty }{\widehat {p}}(dx_{k-1}|x_{k},(y_{0},\cdots ,y_{k-1}))\\&:={\frac {p(y_{k-1}|x_{k-1})p(x_{k}|x_{k-1}){\widehat {p}}(dx_{k-1}|y_{0},\cdots ,y_{k-2})}{\int p(y_{k-1}|x'_{k-1})~p(x_{k}|x'_{k-1}){\widehat {p}}(dx'_{k-1}|y_{0},\cdots ,y_{k-2})}}\\&=\sum _{i=1}^{N}{\frac {p(y_{k-1}|\xi _{k-1}^{i})p(x_{k}|\xi _{k-1}^{i})}{\sum _{j=1}^{N}p(y_{k-1}|\xi _{k-1}^{j})p(x_{k}|\xi _{k-1}^{j})}}\delta _{\xi _{k-1}^{i}}(dx_{k-1})\end{aligned}}}

We conclude that

p ( d ( x 0 , , x n ) | ( y 0 , , y n 1 ) ) N p ^ b a c k w a r d ( d ( x 0 , , x n ) | ( y 0 , , y n 1 ) ) {\displaystyle p(d(x_{0},\cdots ,x_{n})|(y_{0},\cdots ,y_{n-1}))\approx _{N\uparrow \infty }{\widehat {p}}_{backward}(d(x_{0},\cdots ,x_{n})|(y_{0},\cdots ,y_{n-1}))}

with the backward particle approximation

p ^ b a c k w a r d ( d ( x 0 , , x n ) | ( y 0 , , y n 1 ) ) = p ^ ( d x n | ( y 0 , , y n 1 ) ) p ^ ( d x n 1 | x n , ( y 0 , , y n 1 ) ) p ^ ( d x 1 | x 2 , ( y 0 , y 1 ) ) p ^ ( d x 0 | x 1 , y 0 ) {\displaystyle {\begin{aligned}{\widehat {p}}_{backward}(d(x_{0},\cdots ,x_{n})|(y_{0},\cdots ,y_{n-1}))={\widehat {p}}(dx_{n}|(y_{0},\cdots ,y_{n-1})){\widehat {p}}(dx_{n-1}|x_{n},(y_{0},\cdots ,y_{n-1}))\cdots {\widehat {p}}(dx_{1}|x_{2},(y_{0},y_{1})){\widehat {p}}(dx_{0}|x_{1},y_{0})\end{aligned}}}

The probability measure

p ^ b a c k w a r d ( d ( x 0 , , x n ) | ( y 0 , , y n 1 ) ) {\displaystyle {\widehat {p}}_{backward}(d(x_{0},\cdots ,x_{n})|(y_{0},\cdots ,y_{n-1}))}

is the probability of the random paths of a Markov chain ( X k , n ) 0 k n {\displaystyle \left(\mathbb {X} _{k,n}^{\flat }\right)_{0\leqslant k\leqslant n}} running backward in time from time k=n to time k=0, and evolving at each time step k in the state space associated with the population of particles ξ k i , i = 1 , , N . {\displaystyle \xi _{k}^{i},i=1,\cdots ,N.}

  • Initially (at time k=n) the chain X n , n {\displaystyle \mathbb {X} _{n,n}^{\flat }} chooses randomly a state with the distribution
p ^ ( d x n | ( y 0 , , y n 1 ) ) = 1 N i = 1 N δ ξ n i ( d x n ) {\displaystyle {\widehat {p}}(dx_{n}|(y_{0},\cdots ,y_{n-1}))={\frac {1}{N}}\sum _{i=1}^{N}\delta _{\xi _{n}^{i}}(dx_{n})}
  • From time k to the time (k-1), the chain starting at some state X k , n = ξ k i {\displaystyle \mathbb {X} _{k,n}^{\flat }=\xi _{k}^{i}} for some i = 1 , , N {\displaystyle i=1,\cdots ,N} at time k moves at time (k-1) to a random state X k 1 , n {\displaystyle \mathbb {X} _{k-1,n}^{\flat }} chosen with the discrete weighted probability
p ^ ( d x k 1 | ξ k i , ( y 0 , , y k 1 ) ) = j = 1 N p ( y k 1 | ξ k 1 j ) p ( ξ k i | ξ k 1 j ) l = 1 N p ( y k 1 | ξ k 1 l ) p ( ξ k i | ξ k 1 l )   δ ξ k 1 j ( d x k 1 ) {\displaystyle {\widehat {p}}(dx_{k-1}|\xi _{k}^{i},(y_{0},\cdots ,y_{k-1}))=\sum _{j=1}^{N}{\frac {p(y_{k-1}|\xi _{k-1}^{j})p(\xi _{k}^{i}|\xi _{k-1}^{j})}{\sum _{l=1}^{N}p(y_{k-1}|\xi _{k-1}^{l})p(\xi _{k}^{i}|\xi _{k-1}^{l})}}~\delta _{\xi _{k-1}^{j}}(dx_{k-1})}

In the above displayed formula, p ^ ( d x k 1 | ξ k i , ( y 0 , , y k 1 ) ) {\displaystyle {\widehat {p}}(dx_{k-1}|\xi _{k}^{i},(y_{0},\cdots ,y_{k-1}))} stands for the conditional distribution p ^ ( d x k 1 | x k , ( y 0 , , y k 1 ) ) {\displaystyle {\widehat {p}}(dx_{k-1}|x_{k},(y_{0},\cdots ,y_{k-1}))} evaluated at x k = ξ k i {\displaystyle x_{k}=\xi _{k}^{i}} . In the same vein, p ( y k 1 | ξ k 1 j ) {\displaystyle p(y_{k-1}|\xi _{k-1}^{j})} and p ( ξ k i | ξ k 1 j ) {\displaystyle p(\xi _{k}^{i}|\xi _{k-1}^{j})} stand for the conditional densities p ( y k 1 | x k 1 ) {\displaystyle p(y_{k-1}|x_{k-1})} and p ( x k | x k 1 ) {\displaystyle p(x_{k}|x_{k-1})} evaluated at x k = ξ k i {\displaystyle x_{k}=\xi _{k}^{i}} and x k 1 = ξ k 1 j . {\displaystyle x_{k-1}=\xi _{k-1}^{j}.} These models allows to reduce integration with respect to the densities p ( ( x 0 , , x n ) | ( y 0 , , y n 1 ) ) {\displaystyle p((x_{0},\cdots ,x_{n})|(y_{0},\cdots ,y_{n-1}))} in terms of matrix operations with respect to the Markov transitions of the chain described above. For instance, for any function f k {\displaystyle f_{k}} we have the particle estimates

p ( d ( x 0 , , x n ) | ( y 0 , , y n 1 ) ) f k ( x k ) N p ^ b a c k w a r d ( d ( x 0 , , x n ) | ( y 0 , , y n 1 ) ) f k ( x k ) = p ^ ( d x n | ( y 0 , , y n 1 ) ) p ^ ( d x n 1 | x n , ( y 0 , , y n 1 ) ) p ^ ( d x k | x k + 1 , ( y 0 , , y k ) ) f k ( x k ) = [ 1 N , , 1 N ] N  times M n 1 M k [ f k ( ξ k 1 ) f k ( ξ k N ) ] {\displaystyle {\begin{aligned}\int p(d(x_{0},\cdots ,x_{n})&|(y_{0},\cdots ,y_{n-1}))f_{k}(x_{k})\\&\approx _{N\uparrow \infty }\int {\widehat {p}}_{backward}(d(x_{0},\cdots ,x_{n})|(y_{0},\cdots ,y_{n-1}))f_{k}(x_{k})\\&=\int {\widehat {p}}(dx_{n}|(y_{0},\cdots ,y_{n-1})){\widehat {p}}(dx_{n-1}|x_{n},(y_{0},\cdots ,y_{n-1}))\cdots {\widehat {p}}(dx_{k}|x_{k+1},(y_{0},\cdots ,y_{k}))f_{k}(x_{k})\\&=\underbrace {\left} _{N{\text{ times}}}\mathbb {M} _{n-1}\cdots \mathbb {M} _{k}{\begin{bmatrix}f_{k}(\xi _{k}^{1})\\\vdots \\f_{k}(\xi _{k}^{N})\end{bmatrix}}\end{aligned}}}

where

M k = ( M k ( i , j ) ) 1 i , j N : M k ( i , j ) = p ( ξ k i | ξ k 1 j )   p ( y k 1 | ξ k 1 j ) l = 1 N p ( ξ k i | ξ k 1 l ) p ( y k 1 | ξ k 1 l ) {\displaystyle \mathbb {M} _{k}=(\mathbb {M} _{k}(i,j))_{1\leqslant i,j\leqslant N}:\qquad \mathbb {M} _{k}(i,j)={\frac {p(\xi _{k}^{i}|\xi _{k-1}^{j})~p(y_{k-1}|\xi _{k-1}^{j})}{\sum \limits _{l=1}^{N}p(\xi _{k}^{i}|\xi _{k-1}^{l})p(y_{k-1}|\xi _{k-1}^{l})}}}

This also shows that if

F ¯ ( x 0 , , x n ) := 1 n + 1 k = 0 n f k ( x k ) {\displaystyle {\overline {F}}(x_{0},\cdots ,x_{n}):={\frac {1}{n+1}}\sum _{k=0}^{n}f_{k}(x_{k})}

then

F ¯ ( x 0 , , x n ) p ( d ( x 0 , , x n ) | ( y 0 , , y n 1 ) ) N F ¯ ( x 0 , , x n ) p ^ b a c k w a r d ( d ( x 0 , , x n ) | ( y 0 , , y n 1 ) ) = 1 n + 1 k = 0 n [ 1 N , , 1 N ] N  times M n 1 M n 2 M k [ f k ( ξ k 1 ) f k ( ξ k N ) ] {\displaystyle {\begin{aligned}\int {\overline {F}}(x_{0},\cdots ,x_{n})p(d(x_{0},\cdots ,x_{n})|(y_{0},\cdots ,y_{n-1}))&\approx _{N\uparrow \infty }\int {\overline {F}}(x_{0},\cdots ,x_{n}){\widehat {p}}_{backward}(d(x_{0},\cdots ,x_{n})|(y_{0},\cdots ,y_{n-1}))\\&={\frac {1}{n+1}}\sum _{k=0}^{n}\underbrace {\left} _{N{\text{ times}}}\mathbb {M} _{n-1}\mathbb {M} _{n-2}\cdots \mathbb {M} _{k}{\begin{bmatrix}f_{k}(\xi _{k}^{1})\\\vdots \\f_{k}(\xi _{k}^{N})\end{bmatrix}}\end{aligned}}}

Some convergence results

We shall assume that filtering equation is stable, in the sense that it corrects any erroneous initial condition.

In this situation, the particle approximations of the likelihood functions are unbiased and the relative variance is controlled by

E ( p ^ ( y 0 , , y n ) ) = p ( y 0 , , y n ) , E ( [ p ^ ( y 0 , , y n ) p ( y 0 , , y n ) 1 ] 2 ) c n N , {\displaystyle E\left({\widehat {p}}(y_{0},\cdots ,y_{n})\right)=p(y_{0},\cdots ,y_{n}),\qquad E\left(\left^{2}\right)\leqslant {\frac {cn}{N}},}

for some finite constant c. In addition, for any x 0 {\displaystyle x\geqslant 0} :

P ( | 1 n log p ^ ( y 0 , , y n ) 1 n log p ( y 0 , , y n ) | c 1 x N + c 2 x N ) > 1 e x {\displaystyle \mathbf {P} \left(\left\vert {\frac {1}{n}}\log {{\widehat {p}}(y_{0},\cdots ,y_{n})}-{\frac {1}{n}}\log {p(y_{0},\cdots ,y_{n})}\right\vert \leqslant c_{1}{\frac {x}{N}}+c_{2}{\sqrt {\frac {x}{N}}}\right)>1-e^{-x}}

for some finite constants c 1 , c 2 {\displaystyle c_{1},c_{2}} related to the asymptotic bias and variance of the particle estimate, and for some finite constant c.

The bias and the variance of the particle particle estimates based on the ancestral lines of the genealogical trees

I k p a t h ( F ) := F ( x 0 , , x k ) p ( d ( x 0 , , x k ) | y 0 , , y k 1 ) N I ^ k p a t h ( F ) := F ( x 0 , , x k ) p ^ ( d ( x 0 , , x k ) | y 0 , , y k 1 ) = 1 N i = 1 N F ( ξ 0 , k i , , ξ k , k i ) {\displaystyle {\begin{aligned}I_{k}^{path}(F)&:=\int F(x_{0},\cdots ,x_{k})p(d(x_{0},\cdots ,x_{k})|y_{0},\cdots ,y_{k-1})\\&\approx _{N\uparrow \infty }{\widehat {I}}_{k}^{path}(F)\\&:=\int F(x_{0},\cdots ,x_{k}){\widehat {p}}(d(x_{0},\cdots ,x_{k})|y_{0},\cdots ,y_{k-1})\\&={\frac {1}{N}}\sum _{i=1}^{N}F\left(\xi _{0,k}^{i},\cdots ,\xi _{k,k}^{i}\right)\end{aligned}}}

are controlled by the non asymptotic uniform estimates

| E ( I ^ k p a t h ( F ) ) I k p a t h ( F ) | c 1 k N , E ( [ I ^ k p a t h ( F ) I k p a t h ( F ) ] 2 ) c 2 k N , {\displaystyle \left|E\left({\widehat {I}}_{k}^{path}(F)\right)-I_{k}^{path}(F)\right|\leqslant {\frac {c_{1}k}{N}},\qquad E\left(\left^{2}\right)\leqslant {\frac {c_{2}k}{N}},}

for any function F bounded by 1, and for some finite constants c 1 , c 2 . {\displaystyle c_{1},c_{2}.} In addition, for any x 0 {\displaystyle x\geqslant 0} :

P ( | I ^ k p a t h ( F ) I k p a t h ( F ) | c 1 k x N + c 2 k x N sup 0 k n | I ^ k p a t h ( F ) I k p a t h ( F ) | c x n log ( n ) N ) > 1 e x {\displaystyle \mathbf {P} \left(\left|{\widehat {I}}_{k}^{path}(F)-I_{k}^{path}(F)\right|\leqslant c_{1}{\frac {kx}{N}}+c_{2}{\sqrt {\frac {kx}{N}}}\land \sup _{0\leqslant k\leqslant n}\left|{\widehat {I}}_{k}^{path}(F)-I_{k}^{path}(F)\right|\leqslant c{\sqrt {\frac {xn\log(n)}{N}}}\right)>1-e^{-x}}

for some finite constants c 1 , c 2 {\displaystyle c_{1},c_{2}} related to the asymptotic bias and variance of the particle estimate, and for some finite constant c. The same type of bias and variance estimates hold for the backward particle smoothers. For additive functionals of the form

F ¯ ( x 0 , , x n ) := 1 n + 1 0 k n f k ( x k ) {\displaystyle {\overline {F}}(x_{0},\cdots ,x_{n}):={\frac {1}{n+1}}\sum _{0\leqslant k\leqslant n}f_{k}(x_{k})}

with

I n p a t h ( F ¯ ) N I n , p a t h ( F ¯ ) := F ¯ ( x 0 , , x n ) p ^ b a c k w a r d ( d ( x 0 , , x n ) | ( y 0 , , y n 1 ) ) {\displaystyle I_{n}^{path}({\overline {F}})\approx _{N\uparrow \infty }I_{n}^{\flat ,path}({\overline {F}}):=\int {\overline {F}}(x_{0},\cdots ,x_{n}){\widehat {p}}_{backward}(d(x_{0},\cdots ,x_{n})|(y_{0},\cdots ,y_{n-1}))}

with functions f k {\displaystyle f_{k}} bounded by 1, we have

sup n 0 | E ( I ^ n , p a t h ( F ¯ ) ) I n p a t h ( F ¯ ) | c 1 N {\displaystyle \sup _{n\geqslant 0}{\left\vert E\left({\widehat {I}}_{n}^{\flat ,path}({\overline {F}})\right)-I_{n}^{path}({\overline {F}})\right\vert }\leqslant {\frac {c_{1}}{N}}}

and

E ( [ I ^ n , p a t h ( F ) I n p a t h ( F ) ] 2 ) c 2 n N + c 3 N 2 {\displaystyle E\left(\left^{2}\right)\leqslant {\frac {c_{2}}{nN}}+{\frac {c_{3}}{N^{2}}}}

for some finite constants c 1 , c 2 , c 3 . {\displaystyle c_{1},c_{2},c_{3}.} More refined estimates including exponentially small probability of errors are developed in.

Sequential Importance Resampling (SIR)

Monte Carlo filter and bootstrap filter

Sequential importance Resampling (SIR), Monte Carlo filtering (Kitagawa 1993), bootstrap filtering algorithm (Gordon et al. 1993) and single distribution resampling (Bejuri W.M.Y.B et al. 2017), are also commonly applied filtering algorithms, which approximate the filtering probability density p ( x k | y 0 , , y k ) {\displaystyle p(x_{k}|y_{0},\cdots ,y_{k})} by a weighted set of N samples

{ ( w k ( i ) , x k ( i ) )   :   i { 1 , , N } } . {\displaystyle \left\{\left(w_{k}^{(i)},x_{k}^{(i)}\right)\ :\ i\in \{1,\cdots ,N\}\right\}.}

The importance weights w k ( i ) {\displaystyle w_{k}^{(i)}} are approximations to the relative posterior probabilities (or densities) of the samples such that

i = 1 N w k ( i ) = 1. {\displaystyle \sum _{i=1}^{N}w_{k}^{(i)}=1.}

Sequential importance sampling (SIS) is a sequential (i.e., recursive) version of importance sampling. As in importance sampling, the expectation of a function f can be approximated as a weighted average

f ( x k ) p ( x k | y 0 , , y k ) d x k i = 1 N w k ( i ) f ( x k ( i ) ) . {\displaystyle \int f(x_{k})p(x_{k}|y_{0},\dots ,y_{k})dx_{k}\approx \sum _{i=1}^{N}w_{k}^{(i)}f(x_{k}^{(i)}).}

For a finite set of samples, the algorithm performance is dependent on the choice of the proposal distribution

π ( x k | x 0 : k 1 , y 0 : k ) {\displaystyle \pi (x_{k}|x_{0:k-1},y_{0:k})\,} .

The "optimal" proposal distribution is given as the target distribution

π ( x k | x 0 : k 1 , y 0 : k ) = p ( x k | x k 1 , y k ) = p ( y k | x k ) p ( y k | x k ) p ( x k | x k 1 ) d x k   p ( x k | x k 1 ) . {\displaystyle \pi (x_{k}|x_{0:k-1},y_{0:k})=p(x_{k}|x_{k-1},y_{k})={\frac {p(y_{k}|x_{k})}{\int p(y_{k}|x_{k})p(x_{k}|x_{k-1})dx_{k}}}~p(x_{k}|x_{k-1}).}

This particular choice of proposal transition has been proposed by P. Del Moral in 1996 and 1998. When it is difficult to sample transitions according to the distribution p ( x k | x k 1 , y k ) {\displaystyle p(x_{k}|x_{k-1},y_{k})} one natural strategy is to use the following particle approximation

p ( y k | x k ) p ( y k | x k ) p ( x k | x k 1 ) d x k p ( x k | x k 1 ) d x k N p ( y k | x k ) p ( y k | x k ) p ^ ( d x k | x k 1 ) p ^ ( d x k | x k 1 ) = i = 1 N p ( y k | X k i ( x k 1 ) ) j = 1 N p ( y k | X k j ( x k 1 ) ) δ X k i ( x k 1 ) ( d x k ) {\displaystyle {\begin{aligned}{\frac {p(y_{k}|x_{k})}{\int p(y_{k}|x_{k})p(x_{k}|x_{k-1})dx_{k}}}p(x_{k}|x_{k-1})dx_{k}&\simeq _{N\uparrow \infty }{\frac {p(y_{k}|x_{k})}{\int p(y_{k}|x_{k}){\widehat {p}}(dx_{k}|x_{k-1})}}{\widehat {p}}(dx_{k}|x_{k-1})\\&=\sum _{i=1}^{N}{\frac {p(y_{k}|X_{k}^{i}(x_{k-1}))}{\sum _{j=1}^{N}p(y_{k}|X_{k}^{j}(x_{k-1}))}}\delta _{X_{k}^{i}(x_{k-1})}(dx_{k})\end{aligned}}}

with the empirical approximation

p ^ ( d x k | x k 1 ) = 1 N i = 1 N δ X k i ( x k 1 ) ( d x k )   N p ( x k | x k 1 ) d x k {\displaystyle {\widehat {p}}(dx_{k}|x_{k-1})={\frac {1}{N}}\sum _{i=1}^{N}\delta _{X_{k}^{i}(x_{k-1})}(dx_{k})~\simeq _{N\uparrow \infty }p(x_{k}|x_{k-1})dx_{k}}

associated with N (or any other large number of samples) independent random samples X k i ( x k 1 ) , i = 1 , , N {\displaystyle X_{k}^{i}(x_{k-1}),i=1,\cdots ,N} with the conditional distribution of the random state X k {\displaystyle X_{k}} given X k 1 = x k 1 {\displaystyle X_{k-1}=x_{k-1}} . The consistency of the resulting particle filter of this approximation and other extensions are developed in. In the above display δ a {\displaystyle \delta _{a}} stands for the Dirac measure at a given state a.

However, the transition prior probability distribution is often used as importance function, since it is easier to draw particles (or samples) and perform subsequent importance weight calculations:

π ( x k | x 0 : k 1 , y 0 : k ) = p ( x k | x k 1 ) . {\displaystyle \pi (x_{k}|x_{0:k-1},y_{0:k})=p(x_{k}|x_{k-1}).}

Sequential Importance Resampling (SIR) filters with transition prior probability distribution as importance function are commonly known as bootstrap filter and condensation algorithm.

Resampling is used to avoid the problem of the degeneracy of the algorithm, that is, avoiding the situation that all but one of the importance weights are close to zero. The performance of the algorithm can be also affected by proper choice of resampling method. The stratified sampling proposed by Kitagawa (1993) is optimal in terms of variance.

A single step of sequential importance resampling is as follows:

1) For i = 1 , , N {\displaystyle i=1,\cdots ,N} draw samples from the proposal distribution
x k ( i ) π ( x k | x 0 : k 1 ( i ) , y 0 : k ) {\displaystyle x_{k}^{(i)}\sim \pi (x_{k}|x_{0:k-1}^{(i)},y_{0:k})}
2) For i = 1 , , N {\displaystyle i=1,\cdots ,N} update the importance weights up to a normalizing constant:
w ^ k ( i ) = w k 1 ( i ) p ( y k | x k ( i ) ) p ( x k ( i ) | x k 1 ( i ) ) π ( x k ( i ) | x 0 : k 1 ( i ) , y 0 : k ) . {\displaystyle {\hat {w}}_{k}^{(i)}=w_{k-1}^{(i)}{\frac {p(y_{k}|x_{k}^{(i)})p(x_{k}^{(i)}|x_{k-1}^{(i)})}{\pi (x_{k}^{(i)}|x_{0:k-1}^{(i)},y_{0:k})}}.}
Note that when we use the transition prior probability distribution as the importance function,
π ( x k ( i ) | x 0 : k 1 ( i ) , y 0 : k ) = p ( x k ( i ) | x k 1 ( i ) ) , {\displaystyle \pi (x_{k}^{(i)}|x_{0:k-1}^{(i)},y_{0:k})=p(x_{k}^{(i)}|x_{k-1}^{(i)}),}
this simplifies to the following :
w ^ k ( i ) = w k 1 ( i ) p ( y k | x k ( i ) ) , {\displaystyle {\hat {w}}_{k}^{(i)}=w_{k-1}^{(i)}p(y_{k}|x_{k}^{(i)}),}
3) For i = 1 , , N {\displaystyle i=1,\cdots ,N} compute the normalized importance weights:
w k ( i ) = w ^ k ( i ) j = 1 N w ^ k ( j ) {\displaystyle w_{k}^{(i)}={\frac {{\hat {w}}_{k}^{(i)}}{\sum _{j=1}^{N}{\hat {w}}_{k}^{(j)}}}}
4) Compute an estimate of the effective number of particles as
N ^ e f f = 1 i = 1 N ( w k ( i ) ) 2 {\displaystyle {\hat {N}}_{\mathit {eff}}={\frac {1}{\sum _{i=1}^{N}\left(w_{k}^{(i)}\right)^{2}}}}
This criterion reflects the variance of the weights. Other criteria can be found in the article, including their rigorous analysis and central limit theorems.
5) If the effective number of particles is less than a given threshold N ^ e f f < N t h r {\displaystyle {\hat {N}}_{\mathit {eff}}<N_{thr}} , then perform resampling:
a) Draw N particles from the current particle set with probabilities proportional to their weights. Replace the current particle set with this new one.
b) For i = 1 , , N {\displaystyle i=1,\cdots ,N} set w k ( i ) = 1 / N . {\displaystyle w_{k}^{(i)}=1/N.}

The term "Sampling Importance Resampling" is also sometimes used when referring to SIR filters, but the term Importance Resampling is more accurate because the word "resampling" implies that the initial sampling has already been done.

Sequential importance sampling (SIS)

  • Is the same as sequential importance resampling, but without the resampling stage.

"Direct version" algorithm

This section may be confusing or unclear to readers. Please help clarify the section. There might be a discussion about this on the talk page. (October 2011) (Learn how and when to remove this message)

The "direct version" algorithm is rather simple (compared to other particle filtering algorithms) and it uses composition and rejection. To generate a single sample x at k from p x k | y 1 : k ( x | y 1 : k ) {\displaystyle p_{x_{k}|y_{1:k}}(x|y_{1:k})} :

1) Set n = 0 (This will count the number of particles generated so far)
2) Uniformly choose an index i from the range { 1 , . . . , N } {\displaystyle \{1,...,N\}}
3) Generate a test x ^ {\displaystyle {\hat {x}}} from the distribution p ( x k | x k 1 ) {\displaystyle p(x_{k}|x_{k-1})} with x k 1 = x k 1 | k 1 ( i ) {\displaystyle x_{k-1}=x_{k-1|k-1}^{(i)}}
4) Generate the probability of y ^ {\displaystyle {\hat {y}}} using x ^ {\displaystyle {\hat {x}}} from p ( y k | x k ) ,   with   x k = x ^ {\displaystyle p(y_{k}|x_{k}),~{\mbox{with}}~x_{k}={\hat {x}}} where y k {\displaystyle y_{k}} is the measured value
5) Generate another uniform u from [ 0 , m k ] {\displaystyle } where m k = sup x k p ( y k | x k ) {\displaystyle m_{k}=\sup _{x_{k}}p(y_{k}|x_{k})}
6) Compare u and p ( y ^ ) {\displaystyle p\left({\hat {y}}\right)}
6a) If u is larger then repeat from step 2
6b) If u is smaller then save x ^ {\displaystyle {\hat {x}}} as x k | k ( i ) {\displaystyle x_{k|k}^{(i)}} and increment n
7) If n == N then quit

The goal is to generate P "particles" at k using only the particles from k 1 {\displaystyle k-1} . This requires that a Markov equation can be written (and computed) to generate a x k {\displaystyle x_{k}} based only upon x k 1 {\displaystyle x_{k-1}} . This algorithm uses the composition of the P particles from k 1 {\displaystyle k-1} to generate a particle at k and repeats (steps 2–6) until P particles are generated at k.

This can be more easily visualized if x is viewed as a two-dimensional array. One dimension is k and the other dimension is the particle number. For example, x ( k , i ) {\displaystyle x(k,i)} would be the i particle at k {\displaystyle k} and can also be written x k ( i ) {\displaystyle x_{k}^{(i)}} (as done above in the algorithm). Step 3 generates a potential x k {\displaystyle x_{k}} based on a randomly chosen particle ( x k 1 ( i ) {\displaystyle x_{k-1}^{(i)}} ) at time k 1 {\displaystyle k-1} and rejects or accepts it in step 6. In other words, the x k {\displaystyle x_{k}} values are generated using the previously generated x k 1 {\displaystyle x_{k-1}} .

Applications

Particle filters and Feynman-Kac particle methodologies find application in several contexts, as an effective mean for tackling noisy observations or strong nonlinearities, such as:

Other particle filters

See also

References

  1. Wills, Adrian G.; Schön, Thomas B. (3 May 2023). "Sequential Monte Carlo: A Unified Review". Annual Review of Control, Robotics, and Autonomous Systems. 6 (1): 159–182. doi:10.1146/annurev-control-042920-015119. ISSN 2573-5144. S2CID 255638127.
  2. ^ Del Moral, Pierre (1996). "Non Linear Filtering: Interacting Particle Solution" (PDF). Markov Processes and Related Fields. 2 (4): 555–580.
  3. Liu, Jun S.; Chen, Rong (1998-09-01). "Sequential Monte Carlo Methods for Dynamic Systems". Journal of the American Statistical Association. 93 (443): 1032–1044. doi:10.1080/01621459.1998.10473765. ISSN 0162-1459.
  4. ^ Del Moral, Pierre (1998). "Measure Valued Processes and Interacting Particle Systems. Application to Non Linear Filtering Problems". Annals of Applied Probability. 8 (2) (Publications du Laboratoire de Statistique et Probabilités, 96-15 (1996) ed.): 438–495. doi:10.1214/aoap/1028903535.
  5. ^ Del Moral, Pierre (2004). Feynman-Kac formulae. Genealogical and interacting particle approximations. Springer. Series: Probability and Applications. p. 556. ISBN 978-0-387-20268-6.
  6. ^ Del Moral, Pierre; Doucet, Arnaud; Jasra, Ajay (2012). "On Adaptive Resampling Procedures for Sequential Monte Carlo Methods" (PDF). Bernoulli. 18 (1): 252–278. doi:10.3150/10-bej335. S2CID 4506682.
  7. ^ Del Moral, Pierre (2004). Feynman-Kac formulae. Genealogical and interacting particle approximations. Probability and its Applications. Springer. p. 575. ISBN 9780387202686. Series: Probability and Applications
  8. ^ Del Moral, Pierre; Miclo, Laurent (2000). "Branching and Interacting Particle Systems Approximations of Feynman-Kac Formulae with Applications to Non-Linear Filtering". In Jacques Azéma; Michel Ledoux; Michel Émery; Marc Yor (eds.). Séminaire de Probabilités XXXIV (PDF). Lecture Notes in Mathematics. Vol. 1729. pp. 1–145. doi:10.1007/bfb0103798. ISBN 978-3-540-67314-9.
  9. ^ Del Moral, Pierre; Miclo, Laurent (2000). "A Moran particle system approximation of Feynman-Kac formulae". Stochastic Processes and Their Applications. 86 (2): 193–216. doi:10.1016/S0304-4149(99)00094-0. S2CID 122757112.
  10. ^ Del Moral, Pierre (2013). Mean field simulation for Monte Carlo integration. Chapman & Hall/CRC Press. p. 626. Monographs on Statistics & Applied Probability
  11. Moral, Piere Del; Doucet, Arnaud (2014). "Particle methods: An introduction with applications". ESAIM: Proc. 44: 1–46. doi:10.1051/proc/201444001.
  12. ^ Rosenbluth, Marshall, N.; Rosenbluth, Arianna, W. (1955). "Monte-Carlo calculations of the average extension of macromolecular chains". J. Chem. Phys. 23 (2): 356–359. Bibcode:1955JChPh..23..356R. doi:10.1063/1.1741967. S2CID 89611599.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  13. ^ Hetherington, Jack, H. (1984). "Observations on the statistical iteration of matrices". Phys. Rev. A. 30 (2713): 2713–2719. Bibcode:1984PhRvA..30.2713H. doi:10.1103/PhysRevA.30.2713.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  14. ^ Del Moral, Pierre (2003). "Particle approximations of Lyapunov exponents connected to Schrödinger operators and Feynman-Kac semigroups". ESAIM Probability & Statistics. 7: 171–208. doi:10.1051/ps:2003001.
  15. Assaraf, Roland; Caffarel, Michel; Khelif, Anatole (2000). "Diffusion Monte Carlo Methods with a fixed number of walkers" (PDF). Phys. Rev. E. 61 (4): 4566–4575. Bibcode:2000PhRvE..61.4566A. doi:10.1103/physreve.61.4566. PMID 11088257. Archived from the original (PDF) on 2014-11-07.
  16. Caffarel, Michel; Ceperley, David; Kalos, Malvin (1993). "Comment on Feynman-Kac Path-Integral Calculation of the Ground-State Energies of Atoms". Phys. Rev. Lett. 71 (13): 2159. Bibcode:1993PhRvL..71.2159C. doi:10.1103/physrevlett.71.2159. PMID 10054598.
  17. Ocone, D. L. (January 1, 1999). "Asymptotic stability of beneš filters". Stochastic Analysis and Applications. 17 (6): 1053–1074. doi:10.1080/07362999908809648. ISSN 0736-2994.
  18. Maurel, Mireille Chaleyat; Michel, Dominique (January 1, 1984). "Des resultats de non existence de filtre de dimension finie". Stochastics. 13 (1–2): 83–102. doi:10.1080/17442508408833312. ISSN 0090-9491.
  19. ^ Hajiramezanali, Ehsan; Imani, Mahdi; Braga-Neto, Ulisses; Qian, Xiaoning; Dougherty, Edward R. (2019). "Scalable optimal Bayesian classification of single-cell trajectories under regulatory model uncertainty". BMC Genomics. 20 (Suppl 6): 435. arXiv:1902.03188. Bibcode:2019arXiv190203188H. doi:10.1186/s12864-019-5720-3. PMC 6561847. PMID 31189480.
  20. Cruz, Marcelo G.; Peters, Gareth W.; Shevchenko, Pavel V. (2015-02-27). Fundamental Aspects of Operational Risk and Insurance Analytics: A Handbook of Operational Risk (1 ed.). Wiley. doi:10.1002/9781118573013. ISBN 978-1-118-11839-9.
  21. Peters, Gareth W.; Shevchenko, Pavel V. (2015-02-20). Advances in Heavy Tailed Risk Modeling: A Handbook of Operational Risk (1 ed.). Wiley. doi:10.1002/9781118909560. ISBN 978-1-118-90953-9.
  22. Turing, Alan M. (October 1950). "Computing machinery and intelligence". Mind. LIX (238): 433–460. doi:10.1093/mind/LIX.236.433.
  23. Barricelli, Nils Aall (1954). "Esempi numerici di processi di evoluzione". Methodos: 45–68.
  24. Barricelli, Nils Aall (1957). "Symbiogenetic evolution processes realized by artificial methods". Methodos: 143–182.
  25. Hammersley, J. M.; Morton, K. W. (1954). "Poor Man's Monte Carlo". Journal of the Royal Statistical Society. Series B (Methodological). 16 (1): 23–38. doi:10.1111/j.2517-6161.1954.tb00145.x. JSTOR 2984008.
  26. Barricelli, Nils Aall (1963). "Numerical testing of evolution theories. Part II. Preliminary tests of performance, symbiogenesis and terrestrial life". Acta Biotheoretica. 16 (3–4): 99–126. doi:10.1007/BF01556602. S2CID 86717105.
  27. "Adaptation in Natural and Artificial Systems | The MIT Press". mitpress.mit.edu. Retrieved 2015-06-06.
  28. Fraser, Alex (1957). "Simulation of genetic systems by automatic digital computers. I. Introduction". Aust. J. Biol. Sci. 10 (4): 484–491. doi:10.1071/BI9570484.
  29. Fraser, Alex; Burnell, Donald (1970). Computer Models in Genetics. New York: McGraw-Hill. ISBN 978-0-07-021904-5.
  30. Crosby, Jack L. (1973). Computer Simulation in Genetics. London: John Wiley & Sons. ISBN 978-0-471-18880-3.
  31. Assaraf, Roland; Caffarel, Michel; Khelif, Anatole (2000). "Diffusion Monte Carlo Methods with a fixed number of walkers" (PDF). Phys. Rev. E. 61 (4): 4566–4575. Bibcode:2000PhRvE..61.4566A. doi:10.1103/physreve.61.4566. PMID 11088257. Archived from the original (PDF) on 2014-11-07.
  32. Caffarel, Michel; Ceperley, David; Kalos, Malvin (1993). "Comment on Feynman-Kac Path-Integral Calculation of the Ground-State Energies of Atoms". Phys. Rev. Lett. 71 (13): 2159. Bibcode:1993PhRvL..71.2159C. doi:10.1103/physrevlett.71.2159. PMID 10054598.
  33. Fermi, Enrique; Richtmyer, Robert, D. (1948). "Note on census-taking in Monte Carlo calculations" (PDF). LAM. 805 (A). Declassified report Los Alamos Archive{{cite journal}}: CS1 maint: multiple names: authors list (link)
  34. Herman, Kahn; Harris, Theodore, E. (1951). "Estimation of particle transmission by random sampling" (PDF). Natl. Bur. Stand. Appl. Math. Ser. 12: 27–30.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  35. ^ Kitagawa, G. (January 1993). "A Monte Carlo Filtering and Smoothing Method for Non-Gaussian Nonlinear State Space Models" (PDF). Proceedings of the 2nd U.S.-Japan Joint Seminar on Statistical Time Series Analysis: 110–131.
  36. Kitagawa, G. (1996). "Monte carlo filter and smoother for non-Gaussian nonlinear state space models". Journal of Computational and Graphical Statistics. 5 (1): 1–25. doi:10.2307/1390750. JSTOR 1390750.
  37. ^ Gordon, N.J.; Salmond, D.J.; Smith, A.F.M. (April 1993). "Novel approach to nonlinear/non-Gaussian Bayesian state estimation". IEE Proceedings F - Radar and Signal Processing. 140 (2): 107–113. doi:10.1049/ip-f-2.1993.0015. ISSN 0956-375X.
  38. Carvalho, Himilcon; Del Moral, Pierre; Monin, André; Salut, Gérard (July 1997). "Optimal Non-linear Filtering in GPS/INS Integration" (PDF). IEEE Transactions on Aerospace and Electronic Systems. 33 (3): 835. Bibcode:1997ITAES..33..835C. doi:10.1109/7.599254. S2CID 27966240. Archived from the original (PDF) on 2022-11-10. Retrieved 2015-06-01.
  39. P. Del Moral, G. Rigal, and G. Salut. Estimation and nonlinear optimal control : An unified framework for particle solutions
    LAAS-CNRS, Toulouse, Research Report no. 91137, DRET-DIGILOG- LAAS/CNRS contract, April (1991).
  40. P. Del Moral, G. Rigal, and G. Salut. Nonlinear and non-Gaussian particle filters applied to inertial platform repositioning.
    LAAS-CNRS, Toulouse, Research Report no. 92207, STCAN/DIGILOG-LAAS/CNRS Convention STCAN no. A.91.77.013, (94p.) September (1991).
  41. P. Del Moral, G. Rigal, and G. Salut. Estimation and nonlinear optimal control : Particle resolution in filtering and estimation. Experimental results.
    Convention DRET no. 89.34.553.00.470.75.01, Research report no.2 (54p.), January (1992).
  42. P. Del Moral, G. Rigal, and G. Salut. Estimation and nonlinear optimal control : Particle resolution in filtering and estimation. Theoretical results
    Convention DRET no. 89.34.553.00.470.75.01, Research report no.3 (123p.), October (1992).
  43. P. Del Moral, J.-Ch. Noyer, G. Rigal, and G. Salut. Particle filters in radar signal processing : detection, estimation and air targets recognition.
    LAAS-CNRS, Toulouse, Research report no. 92495, December (1992).
  44. P. Del Moral, G. Rigal, and G. Salut. Estimation and nonlinear optimal control : Particle resolution in filtering and estimation.
    Studies on: Filtering, optimal control, and maximum likelihood estimation. Convention DRET no. 89.34.553.00.470.75.01. Research report no.4 (210p.), January (1993).
  45. ^ Crisan, Dan; Gaines, Jessica; Lyons, Terry (1998). "Convergence of a branching particle method to the solution of the Zakai". SIAM Journal on Applied Mathematics. 58 (5): 1568–1590. doi:10.1137/s0036139996307371. S2CID 39982562.
  46. Crisan, Dan; Lyons, Terry (1997). "Nonlinear filtering and measure-valued processes". Probability Theory and Related Fields. 109 (2): 217–244. doi:10.1007/s004400050131. S2CID 119809371.
  47. Crisan, Dan; Lyons, Terry (1999). "A particle approximation of the solution of the Kushner–Stratonovitch equation". Probability Theory and Related Fields. 115 (4): 549–578. doi:10.1007/s004400050249. S2CID 117725141.
  48. ^ Crisan, Dan; Del Moral, Pierre; Lyons, Terry (1999). "Discrete filtering using branching and interacting particle systems" (PDF). Markov Processes and Related Fields. 5 (3): 293–318.
  49. ^ Del Moral, Pierre; Guionnet, Alice (1999). "On the stability of Measure Valued Processes with Applications to filtering". C. R. Acad. Sci. Paris. 39 (1): 429–434.
  50. ^ Del Moral, Pierre; Guionnet, Alice (2001). "On the stability of interacting processes with applications to filtering and genetic algorithms". Annales de l'Institut Henri Poincaré. 37 (2): 155–194. Bibcode:2001AIHPB..37..155D. doi:10.1016/s0246-0203(00)01064-5. Archived from the original on 2014-11-07.
  51. ^ Del Moral, P.; Guionnet, A. (1999). "Central limit theorem for nonlinear filtering and interacting particle systems". The Annals of Applied Probability. 9 (2): 275–297. doi:10.1214/aoap/1029962742. ISSN 1050-5164.
  52. ^ Del Moral, Pierre; Miclo, Laurent (2001). "Genealogies and Increasing Propagation of Chaos For Feynman-Kac and Genetic Models". The Annals of Applied Probability. 11 (4): 1166–1198. doi:10.1214/aoap/1015345399. ISSN 1050-5164.
  53. ^ Doucet, A.; De Freitas, N.; Murphy, K.; Russell, S. (2000). Rao–Blackwellised particle filtering for dynamic Bayesian networks. Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence. pp. 176–183. CiteSeerX 10.1.1.137.5199.
  54. ^ Del Moral, Pierre; Miclo, Laurent (2001). "Genealogies and Increasing Propagations of Chaos for Feynman-Kac and Genetic Models". Annals of Applied Probability. 11 (4): 1166–1198.
  55. ^ Del Moral, Pierre; Doucet, Arnaud; Singh, Sumeetpal, S. (2010). "A Backward Particle Interpretation of Feynman-Kac Formulae" (PDF). M2AN. 44 (5): 947–976. doi:10.1051/m2an/2010048. S2CID 14758161.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  56. Vergé, Christelle; Dubarry, Cyrille; Del Moral, Pierre; Moulines, Eric (2013). "On parallel implementation of Sequential Monte Carlo methods: the island particle model". Statistics and Computing. 25 (2): 243–260. arXiv:1306.3911. Bibcode:2013arXiv1306.3911V. doi:10.1007/s11222-013-9429-x. S2CID 39379264.
  57. Chopin, Nicolas; Jacob, Pierre, E.; Papaspiliopoulos, Omiros (2011). "SMC^2: an efficient algorithm for sequential analysis of state-space models". arXiv:1101.1528v3 .{{cite arXiv}}: CS1 maint: multiple names: authors list (link)
  58. Andrieu, Christophe; Doucet, Arnaud; Holenstein, Roman (2010). "Particle Markov chain Monte Carlo methods". Journal of the Royal Statistical Society, Series B. 72 (3): 269–342. doi:10.1111/j.1467-9868.2009.00736.x.
  59. Del Moral, Pierre; Patras, Frédéric; Kohn, Robert (2014). "On Feynman-Kac and particle Markov chain Monte Carlo models". arXiv:1404.5733 .
  60. Del Moral, Pierre; Doucet, Arnaud; Jasra, Ajay (2006). "Sequential Monte Carlo Samplers". Journal of the Royal Statistical Society. Series B (Statistical Methodology). 68 (3): 411–436. arXiv:cond-mat/0212648. doi:10.1111/j.1467-9868.2006.00553.x. ISSN 1369-7412. JSTOR 3879283.
  61. Peters, Gareth (2005). "Topics in Sequential Monte Carlo Samplers". SSRN Electronic Journal. doi:10.2139/ssrn.3785582. ISSN 1556-5068.
  62. Del Moral, Pierre; Doucet, Arnaud; Peters, Gareth (2004). "Sequential Monte Carlo Samplers CUED Technical Report". SSRN Electronic Journal. doi:10.2139/ssrn.3841065. ISSN 1556-5068.
  63. Sisson, S. A.; Fan, Y.; Beaumont, M. A., eds. (2019). Handbook of approximate Bayesian computation. Boca Raton: CRC Press, Taylor and Francis Group. ISBN 978-1-315-11719-5.
  64. Peters, Gareth W.; Wüthrich, Mario V.; Shevchenko, Pavel V. (2010-08-01). "Chain ladder method: Bayesian bootstrap versus classical bootstrap". Insurance: Mathematics and Economics. 47 (1): 36–51. arXiv:1004.2548. doi:10.1016/j.insmatheco.2010.03.007. ISSN 0167-6687.
  65. Del Moral, Pierre; Jacod, Jean; Protter, Philip (2001-07-01). "The Monte-Carlo method for filtering with discrete-time observations". Probability Theory and Related Fields. 120 (3): 346–368. doi:10.1007/PL00008786. hdl:1813/9179. ISSN 0178-8051. S2CID 116274.
  66. Del Moral, Pierre; Doucet, Arnaud; Jasra, Ajay (2011). "An adaptive sequential Monte Carlo method for approximate Bayesian computation". Statistics and Computing. 22 (5): 1009–1020. CiteSeerX 10.1.1.218.9800. doi:10.1007/s11222-011-9271-y. ISSN 0960-3174. S2CID 4514922.
  67. Martin, James S.; Jasra, Ajay; Singh, Sumeetpal S.; Whiteley, Nick; Del Moral, Pierre; McCoy, Emma (May 4, 2014). "Approximate Bayesian Computation for Smoothing". Stochastic Analysis and Applications. 32 (3): 397–420. arXiv:1206.5208. doi:10.1080/07362994.2013.879262. ISSN 0736-2994. S2CID 17117364.
  68. Del Moral, Pierre; Rio, Emmanuel (2011). "Concentration inequalities for mean field particle models". The Annals of Applied Probability. 21 (3): 1017–1052. arXiv:1211.1837. doi:10.1214/10-AAP716. ISSN 1050-5164. S2CID 17693884.
  69. Del Moral, Pierre; Hu, Peng; Wu, Liming (2012). On the Concentration Properties of Interacting Particle Processes. Hanover, MA, USA: Now Publishers Inc. ISBN 978-1601985125.
  70. Bejuri, Wan Mohd Yaakob Wan; Mohamad, Mohd Murtadha; Raja Mohd Radzi, Raja Zahilah; Salleh, Mazleena; Yusof, Ahmad Fadhil (2017-10-18). "Adaptive memory-based single distribution resampling for particle filter". Journal of Big Data. 4 (1): 33. doi:10.1186/s40537-017-0094-3. ISSN 2196-1115. S2CID 256407088.
  71. Gelman, Andrew; Carlin, John B.; Stern, Hal S.; Dunson, David B.; Vehtari, Aki; Rubin, Donald B. (2013). Bayesian Data Analysis, Third Edition. Chapman and Hall/CRC. ISBN 978-1-4398-4095-5.
  72. Creal, Drew (2012). "A Survey of Sequential Monte Carlo Methods for Economics and Finance". Econometric Reviews. 31 (2): 245–296. doi:10.1080/07474938.2011.607333. hdl:1871/15287. S2CID 2730761.
  73. Moss, Robert; Zarebski, Alexander; Dawson, Peter; McCaw, James M. (2016). "Forecasting influenza outbreak dynamics in Melbourne from Internet search query surveillance data". Influenza and Other Respiratory Viruses. 10 (4): 314–323. doi:10.1111/irv.12376. PMC 4910172. PMID 26859411.
  74. Shen, Yin; Xiangping, Zhu (2015). "Intelligent Particle Filter and Its Application to Fault Detection of Nonlinear System". IEEE Transactions on Industrial Electronics. 62 (6): 1. doi:10.1109/TIE.2015.2399396. S2CID 23951880.
  75. D'Amato, Edigio; Notaro, Immacolata; Nardi, Vito Antonio; Scordamaglia, Valerio (2021). "A Particle Filtering Approach for Fault Detection and Isolation of UAV IMU Sensors: Design, Implementation and Sensitivity Analysis". Sensors. 21 (9): 3066. Bibcode:2021Senso..21.3066D. doi:10.3390/s21093066. PMC 8124649. PMID 33924891.
  76. Kadirkamanathan, V.; Li, P.; Jaward, M. H.; Fabri, S. G. (2002). "Particle filtering-based fault detection in non-linear stochastic systems". International Journal of Systems Science. 33 (4): 259–265. doi:10.1080/00207720110102566. S2CID 28634585.
  77. Bonate P: Pharmacokinetic-Pharmacodynamic Modeling and Simulation. Berlin: Springer; 2011.
  78. Dieter Fox, Wolfram Burgard, Frank Dellaert, and Sebastian Thrun, "Monte Carlo Localization: Efficient Position Estimation for Mobile Robots." Proc. of the Sixteenth National Conference on Artificial Intelligence John Wiley & Sons Ltd, 1999.
  79. Sebastian Thrun, Wolfram Burgard, Dieter Fox. Probabilistic Robotics MIT Press, 2005. Ch. 8.3 ISBN 9780262201629.
  80. Sebastian Thrun, Dieter Fox, Wolfram Burgard, Frank Dellaert. "Robust monte carlo localization for mobile robots." Artificial Intelligence 128.1 (2001): 99–141.
  81. Abbasi, Mahdi; Khosravi, Mohammad R. (2020). "A Robust and Accurate Particle Filter-Based Pupil Detection Method for Big Datasets of Eye Video". Journal of Grid Computing. 18 (2): 305–325. doi:10.1007/s10723-019-09502-1. S2CID 209481431.
  82. Pitt, M.K.; Shephard, N. (1999). "Filtering Via Simulation: Auxiliary Particle Filters". Journal of the American Statistical Association. 94 (446): 590–591. doi:10.2307/2670179. JSTOR 2670179. Archived from the original on 2007-10-16. Retrieved 2008-05-06.
  83. Zand, G.; Taherkhani, M.; Safabakhsh, R. (2015). "Exponential Natural Particle Filter". arXiv:1511.06603 .
  84. Canton-Ferrer, C.; Casas, J.R.; Pardàs, M. (2011). "Human Motion Capture Using Scalable Body Models". Computer Vision and Image Understanding. 115 (10): 1363–1374. doi:10.1016/j.cviu.2011.06.001. hdl:2117/13393.
  85. Akyildiz, Ömer Deniz; Míguez, Joaquín (2020-03-01). "Nudging the particle filter". Statistics and Computing. 30 (2): 305–330. doi:10.1007/s11222-019-09884-y. hdl:10044/1/100011. ISSN 1573-1375. S2CID 88515918.
  86. Liu, J.; Wang, W.; Ma, F. (2011). "A Regularized Auxiliary Particle Filtering Approach for System State Estimation and Battery Life Prediction". Smart Materials and Structures. 20 (7): 1–9. Bibcode:2011SMaS...20g5021L. doi:10.1088/0964-1726/20/7/075021. S2CID 110670991.
  87. Blanco, J.L.; Gonzalez, J.; Fernandez-Madrigal, J.A. (2008). An Optimal Filtering Algorithm for Non-Parametric Observation Models in Robot Localization. IEEE International Conference on Robotics and Automation (ICRA'08). pp. 461–466. CiteSeerX 10.1.1.190.7092.
  88. Blanco, J.L.; Gonzalez, J.; Fernandez-Madrigal, J.A. (2010). "Optimal Filtering for Non-Parametric Observation Models: Applications to Localization and SLAM". The International Journal of Robotics Research. 29 (14): 1726–1742. CiteSeerX 10.1.1.1031.4931. doi:10.1177/0278364910364165. S2CID 453697.

Bibliography

External links

Stochastic processes
Discrete time
Continuous time
Both
Fields and other
Time series models
Financial models
Actuarial models
Queueing models
Properties
Limit theorems
Inequalities
Tools
Disciplines
Statistics
Descriptive statistics
Continuous data
Center
Dispersion
Shape
Count data
Summary tables
Dependence
Graphics
Data collection
Study design
Survey methodology
Controlled experiments
Adaptive designs
Observational studies
Statistical inference
Statistical theory
Frequentist inference
Point estimation
Interval estimation
Testing hypotheses
Parametric tests
Specific tests
Goodness of fit
Rank statistics
Bayesian inference
Correlation
Regression analysis
Linear regression
Non-standard predictors
Generalized linear model
Partition of variance
Categorical / Multivariate / Time-series / Survival analysis
Categorical
Multivariate
Time-series
General
Specific tests
Time domain
Frequency domain
Survival
Survival function
Hazard function
Test
Applications
Biostatistics
Engineering statistics
Social statistics
Spatial statistics
Categories: