Misplaced Pages

Integrated nested Laplace approximations

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Bayesian inference method
Part of a series on
Bayesian statistics
Posterior = Likelihood × Prior ÷ Evidence
Background
Model building
Posterior approximation
Estimators
Evidence approximation
Model evaluation

Integrated nested Laplace approximations (INLA) is a method for approximate Bayesian inference based on Laplace's method. It is designed for a class of models called latent Gaussian models (LGMs), for which it can be a fast and accurate alternative for Markov chain Monte Carlo methods to compute posterior marginal distributions. Due to its relative speed even with large data sets for certain problems and models, INLA has been a popular inference method in applied statistics, in particular spatial statistics, ecology, and epidemiology. It is also possible to combine INLA with a finite element method solution of a stochastic partial differential equation to study e.g. spatial point processes and species distribution models. The INLA method is implemented in the R-INLA R package.

Latent Gaussian models

Let y = ( y 1 , , y n ) {\displaystyle {\boldsymbol {y}}=(y_{1},\dots ,y_{n})} denote the response variable (that is, the observations) which belongs to an exponential family, with the mean μ i {\displaystyle \mu _{i}} (of y i {\displaystyle y_{i}} ) being linked to a linear predictor η i {\displaystyle \eta _{i}} via an appropriate link function. The linear predictor can take the form of a (Bayesian) additive model. All latent effects (the linear predictor, the intercept, coefficients of possible covariates, and so on) are collectively denoted by the vector x {\displaystyle {\boldsymbol {x}}} . The hyperparameters of the model are denoted by θ {\displaystyle {\boldsymbol {\theta }}} . As per Bayesian statistics, x {\displaystyle {\boldsymbol {x}}} and θ {\displaystyle {\boldsymbol {\theta }}} are random variables with prior distributions.

The observations are assumed to be conditionally independent given x {\displaystyle {\boldsymbol {x}}} and θ {\displaystyle {\boldsymbol {\theta }}} : π ( y | x , θ ) = i I π ( y i | η i , θ ) , {\displaystyle \pi ({\boldsymbol {y}}|{\boldsymbol {x}},{\boldsymbol {\theta }})=\prod _{i\in {\mathcal {I}}}\pi (y_{i}|\eta _{i},{\boldsymbol {\theta }}),} where I {\displaystyle {\mathcal {I}}} is the set of indices for observed elements of y {\displaystyle {\boldsymbol {y}}} (some elements may be unobserved, and for these INLA computes a posterior predictive distribution). Note that the linear predictor η {\displaystyle {\boldsymbol {\eta }}} is part of x {\displaystyle {\boldsymbol {x}}} .

For the model to be a latent Gaussian model, it is assumed that x | θ {\displaystyle {\boldsymbol {x}}|{\boldsymbol {\theta }}} is a Gaussian Markov Random Field (GMRF) (that is, a multivariate Gaussian with additional conditional independence properties) with probability density π ( x | θ ) | Q θ | 1 / 2 exp ( 1 2 x T Q θ x ) , {\displaystyle \pi ({\boldsymbol {x}}|{\boldsymbol {\theta }})\propto \left|{\boldsymbol {Q_{\theta }}}\right|^{1/2}\exp \left(-{\frac {1}{2}}{\boldsymbol {x}}^{T}{\boldsymbol {Q_{\theta }}}{\boldsymbol {x}}\right),} where Q θ {\displaystyle {\boldsymbol {Q_{\theta }}}} is a θ {\displaystyle {\boldsymbol {\theta }}} -dependent sparse precision matrix and | Q θ | {\displaystyle \left|{\boldsymbol {Q_{\theta }}}\right|} is its determinant. The precision matrix is sparse due to the GMRF assumption. The prior distribution π ( θ ) {\displaystyle \pi ({\boldsymbol {\theta }})} for the hyperparameters need not be Gaussian. However, the number of hyperparameters, m = d i m ( θ ) {\displaystyle m=\mathrm {dim} ({\boldsymbol {\theta }})} , is assumed to be small (say, less than 15).

Approximate Bayesian inference with INLA

In Bayesian inference, one wants to solve for the posterior distribution of the latent variables x {\displaystyle {\boldsymbol {x}}} and θ {\displaystyle {\boldsymbol {\theta }}} . Applying Bayes' theorem π ( x , θ | y ) = π ( y | x , θ ) π ( x | θ ) π ( θ ) π ( y ) , {\displaystyle \pi ({\boldsymbol {x}},{\boldsymbol {\theta }}|{\boldsymbol {y}})={\frac {\pi ({\boldsymbol {y}}|{\boldsymbol {x}},{\boldsymbol {\theta }})\pi ({\boldsymbol {x}}|{\boldsymbol {\theta }})\pi ({\boldsymbol {\theta }})}{\pi ({\boldsymbol {y}})}},} the joint posterior distribution of x {\displaystyle {\boldsymbol {x}}} and θ {\displaystyle {\boldsymbol {\theta }}} is given by π ( x , θ | y ) π ( θ ) π ( x | θ ) i π ( y i | η i , θ ) π ( θ ) | Q θ | 1 / 2 exp ( 1 2 x T Q θ x + i log [ π ( y i | η i , θ ) ] ) . {\displaystyle {\begin{aligned}\pi ({\boldsymbol {x}},{\boldsymbol {\theta }}|{\boldsymbol {y}})&\propto \pi ({\boldsymbol {\theta }})\pi ({\boldsymbol {x}}|{\boldsymbol {\theta }})\prod _{i}\pi (y_{i}|\eta _{i},{\boldsymbol {\theta }})\\&\propto \pi ({\boldsymbol {\theta }})\left|{\boldsymbol {Q_{\theta }}}\right|^{1/2}\exp \left(-{\frac {1}{2}}{\boldsymbol {x}}^{T}{\boldsymbol {Q_{\theta }}}{\boldsymbol {x}}+\sum _{i}\log \left\right).\end{aligned}}} Obtaining the exact posterior is generally a very difficult problem. In INLA, the main aim is to approximate the posterior marginals π ( x i | y ) = π ( x i | θ , y ) π ( θ | y ) d θ π ( θ j | y ) = π ( θ | y ) d θ j , {\displaystyle {\begin{array}{rcl}\pi (x_{i}|{\boldsymbol {y}})&=&\int \pi (x_{i}|{\boldsymbol {\theta }},{\boldsymbol {y}})\pi ({\boldsymbol {\theta }}|{\boldsymbol {y}})d{\boldsymbol {\theta }}\\\pi (\theta _{j}|{\boldsymbol {y}})&=&\int \pi ({\boldsymbol {\theta }}|{\boldsymbol {y}})d{\boldsymbol {\theta }}_{-j},\end{array}}} where θ j = ( θ 1 , , θ j 1 , θ j + 1 , , θ m ) {\displaystyle {\boldsymbol {\theta }}_{-j}=\left(\theta _{1},\dots ,\theta _{j-1},\theta _{j+1},\dots ,\theta _{m}\right)} .

A key idea of INLA is to construct nested approximations given by π ~ ( x i | y ) = π ~ ( x i | θ , y ) π ~ ( θ | y ) d θ π ~ ( θ j | y ) = π ~ ( θ | y ) d θ j , {\displaystyle {\begin{array}{rcl}{\widetilde {\pi }}(x_{i}|{\boldsymbol {y}})&=&\int {\widetilde {\pi }}(x_{i}|{\boldsymbol {\theta }},{\boldsymbol {y}}){\widetilde {\pi }}({\boldsymbol {\theta }}|{\boldsymbol {y}})d{\boldsymbol {\theta }}\\{\widetilde {\pi }}(\theta _{j}|{\boldsymbol {y}})&=&\int {\widetilde {\pi }}({\boldsymbol {\theta }}|{\boldsymbol {y}})d{\boldsymbol {\theta }}_{-j},\end{array}}} where π ~ ( | ) {\displaystyle {\widetilde {\pi }}(\cdot |\cdot )} is an approximated posterior density. The approximation to the marginal density π ( x i | y ) {\displaystyle \pi (x_{i}|{\boldsymbol {y}})} is obtained in a nested fashion by first approximating π ( θ | y ) {\displaystyle \pi ({\boldsymbol {\theta }}|{\boldsymbol {y}})} and π ( x i | θ , y ) {\displaystyle \pi (x_{i}|{\boldsymbol {\theta }},{\boldsymbol {y}})} , and then numerically integrating out θ {\displaystyle {\boldsymbol {\theta }}} as π ~ ( x i | y ) = k π ~ ( x i | θ k , y ) × π ~ ( θ k | y ) × Δ k , {\displaystyle {\begin{aligned}{\widetilde {\pi }}(x_{i}|{\boldsymbol {y}})=\sum _{k}{\widetilde {\pi }}\left(x_{i}|{\boldsymbol {\theta }}_{k},{\boldsymbol {y}}\right)\times {\widetilde {\pi }}({\boldsymbol {\theta }}_{k}|{\boldsymbol {y}})\times \Delta _{k},\end{aligned}}} where the summation is over the values of θ {\displaystyle {\boldsymbol {\theta }}} , with integration weights given by Δ k {\displaystyle \Delta _{k}} . The approximation of π ( θ j | y ) {\displaystyle \pi (\theta _{j}|{\boldsymbol {y}})} is computed by numerically integrating θ j {\displaystyle {\boldsymbol {\theta }}_{-j}} out from π ~ ( θ | y ) {\displaystyle {\widetilde {\pi }}({\boldsymbol {\theta }}|{\boldsymbol {y}})} .

To get the approximate distribution π ~ ( θ | y ) {\displaystyle {\widetilde {\pi }}({\boldsymbol {\theta }}|{\boldsymbol {y}})} , one can use the relation π ( θ | y ) = π ( x , θ , y ) π ( x | θ , y ) π ( y ) , {\displaystyle {\begin{aligned}{\pi }({\boldsymbol {\theta }}|{\boldsymbol {y}})={\frac {\pi \left({\boldsymbol {x}},{\boldsymbol {\theta }},{\boldsymbol {y}}\right)}{\pi \left({\boldsymbol {x}}|{\boldsymbol {\theta }},{\boldsymbol {y}}\right)\pi ({\boldsymbol {y}})}},\end{aligned}}} as the starting point. Then π ~ ( θ | y ) {\displaystyle {\widetilde {\pi }}({\boldsymbol {\theta }}|{\boldsymbol {y}})} is obtained at a specific value of the hyperparameters θ = θ k {\displaystyle {\boldsymbol {\theta }}={\boldsymbol {\theta }}_{k}} with Laplace's approximation π ~ ( θ k | y ) π ( x , θ k , y ) π ~ G ( x | θ k , y ) | x = x ( θ k ) , π ( y | x , θ k ) π ( x | θ k ) π ( θ k ) π ~ G ( x | θ k , y ) | x = x ( θ k ) , {\displaystyle {\begin{aligned}{\widetilde {\pi }}({\boldsymbol {\theta }}_{k}|{\boldsymbol {y}})&\propto \left.{\frac {\pi \left({\boldsymbol {x}},{\boldsymbol {\theta }}_{k},{\boldsymbol {y}}\right)}{{\widetilde {\pi }}_{G}\left({\boldsymbol {x}}|{\boldsymbol {\theta }}_{k},{\boldsymbol {y}}\right)}}\right\vert _{{\boldsymbol {x}}={\boldsymbol {x}}^{*}({\boldsymbol {\theta }}_{k})},\\&\propto \left.{\frac {\pi ({\boldsymbol {y}}|{\boldsymbol {x}},{\boldsymbol {\theta }}_{k})\pi ({\boldsymbol {x}}|{\boldsymbol {\theta }}_{k})\pi ({\boldsymbol {\theta }}_{k})}{{\widetilde {\pi }}_{G}\left({\boldsymbol {x}}|{\boldsymbol {\theta }}_{k},{\boldsymbol {y}}\right)}}\right\vert _{{\boldsymbol {x}}={\boldsymbol {x}}^{*}({\boldsymbol {\theta }}_{k})},\end{aligned}}} where π ~ G ( x | θ k , y ) {\displaystyle {\widetilde {\pi }}_{G}\left({\boldsymbol {x}}|{\boldsymbol {\theta }}_{k},{\boldsymbol {y}}\right)} is the Gaussian approximation to π ( x | θ k , y ) {\displaystyle {\pi }\left({\boldsymbol {x}}|{\boldsymbol {\theta }}_{k},{\boldsymbol {y}}\right)} whose mode at a given θ k {\displaystyle {\boldsymbol {\theta }}_{k}} is x ( θ k ) {\displaystyle {\boldsymbol {x}}^{*}({\boldsymbol {\theta }}_{k})} . The mode can be found numerically for example with the Newton-Raphson method.

The trick in the Laplace approximation above is the fact that the Gaussian approximation is applied on the full conditional of x {\displaystyle {\boldsymbol {x}}} in the denominator since it is usually close to a Gaussian due to the GMRF property of x {\displaystyle {\boldsymbol {x}}} . Applying the approximation here improves the accuracy of the method, since the posterior π ( θ | y ) {\displaystyle {\pi }({\boldsymbol {\theta }}|{\boldsymbol {y}})} itself need not be close to a Gaussian, and so the Gaussian approximation is not directly applied on π ( θ | y ) {\displaystyle {\pi }({\boldsymbol {\theta }}|{\boldsymbol {y}})} . The second important property of a GMRF, the sparsity of the precision matrix Q θ k {\displaystyle {\boldsymbol {Q}}_{{\boldsymbol {\theta }}_{k}}} , is required for efficient computation of π ~ ( θ k | y ) {\displaystyle {\widetilde {\pi }}({\boldsymbol {\theta }}_{k}|{\boldsymbol {y}})} for each value θ k {\displaystyle {{\boldsymbol {\theta }}_{k}}} .

Obtaining the approximate distribution π ~ ( x i | θ k , y ) {\displaystyle {\widetilde {\pi }}\left(x_{i}|{\boldsymbol {\theta }}_{k},{\boldsymbol {y}}\right)} is more involved, and the INLA method provides three options for this: Gaussian approximation, Laplace approximation, or the simplified Laplace approximation. For the numerical integration to obtain π ~ ( x i | y ) {\displaystyle {\widetilde {\pi }}(x_{i}|{\boldsymbol {y}})} , also three options are available: grid search, central composite design, or empirical Bayes.

References

  1. ^ Rue, Håvard; Martino, Sara; Chopin, Nicolas (2009). "Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations". J. R. Statist. Soc. B. 71 (2): 319–392. doi:10.1111/j.1467-9868.2008.00700.x. hdl:2066/75507. S2CID 1657669.
  2. Taylor, Benjamin M.; Diggle, Peter J. (2014). "INLA or MCMC? A tutorial and comparative evaluation for spatial prediction in log-Gaussian Cox processes". Journal of Statistical Computation and Simulation. 84 (10): 2266–2284. arXiv:1202.1738. doi:10.1080/00949655.2013.788653. S2CID 88511801.
  3. Teng, M.; Nathoo, F.; Johnson, T. D. (2017). "Bayesian computation for Log-Gaussian Cox processes: a comparative analysis of methods". Journal of Statistical Computation and Simulation. 87 (11): 2227–2252. doi:10.1080/00949655.2017.1326117. PMC 5708893. PMID 29200537.
  4. Wang, Xiaofeng; Yue, Yu Ryan; Faraway, Julian J. (2018). Bayesian Regression Modeling with INLA. Chapman and Hall/CRC. ISBN 9781498727259.
  5. Blangiardo, Marta; Cameletti, Michela (2015). Spatial and Spatio-temporal Bayesian Models with R-INLA. John Wiley & Sons, Ltd. ISBN 9781118326558.
  6. Opitz, T. (2017). "Latent Gaussian modeling and INLA: A review with focus on space-time applications". Journal de la Société Française de Statistique. 158: 62–85. arXiv:1708.02723.
  7. Moraga, Paula (2019). Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny. Chapman and Hall/CRC. ISBN 9780367357955.
  8. Lindgren, Finn; Rue, Håvard; Lindström, Johan (2011). "An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach". J. R. Statist. Soc. B. 73 (4): 423–498. doi:10.1111/j.1467-9868.2011.00777.x. hdl:20.500.11820/1084d335-e5b4-4867-9245-ec9c4f6f4645. S2CID 120949984.
  9. Lezama-Ochoa, N.; Grazia Pennino, M.; Hall, M. A.; Lopez, J.; Murua, H. (2020). "Using a Bayesian modelling approach (INLA-SPDE) to predict the occurrence of the Spinetail Devil Ray (Mobular mobular)". Scientific Reports. 10 (1): 18822. Bibcode:2020NatSR..1018822L. doi:10.1038/s41598-020-73879-3. PMC 7606447. PMID 33139744.
  10. "R-INLA Project". Retrieved 21 April 2022.

Further reading

  • Gomez-Rubio, Virgilio (2021). Bayesian inference with INLA. Chapman and Hall/CRC. ISBN 978-1-03-217453-2.
Categories: