Misplaced Pages

Deviance (statistics)

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Measure of goodness of fit for a statistical model Not to be confused with Deviate (statistics), Deviation (statistics), Discrepancy (statistics), or Divergence (statistics).

In statistics, deviance is a goodness-of-fit statistic for a statistical model; it is often used for statistical hypothesis testing. It is a generalization of the idea of using the sum of squares of residuals (SSR) in ordinary least squares to cases where model-fitting is achieved by maximum likelihood. It plays an important role in exponential dispersion models and generalized linear models.

Deviance can be related to Kullback-Leibler divergence.

Definition

The unit deviance d ( y , μ ) {\displaystyle d(y,\mu )} is a bivariate function that satisfies the following conditions:

  • d ( y , y ) = 0 {\displaystyle d(y,y)=0}
  • d ( y , μ ) > 0 y μ {\displaystyle d(y,\mu )>0\quad \forall y\neq \mu }

The total deviance D ( y , μ ^ ) {\displaystyle D(\mathbf {y} ,{\hat {\boldsymbol {\mu }}})} of a model with predictions μ ^ {\displaystyle {\hat {\boldsymbol {\mu }}}} of the observation y {\displaystyle \mathbf {y} } is the sum of its unit deviances: D ( y , μ ^ ) = i d ( y i , μ ^ i ) {\textstyle D(\mathbf {y} ,{\hat {\boldsymbol {\mu }}})=\sum _{i}d(y_{i},{\hat {\mu }}_{i})} .

The (total) deviance for a model M0 with estimates μ ^ = E [ Y | θ ^ 0 ] {\displaystyle {\hat {\mu }}=E} , based on a dataset y, may be constructed by its likelihood as: D ( y , μ ^ ) = 2 ( log [ p ( y θ ^ s ) ] log [ p ( y θ ^ 0 ) ] ) . {\displaystyle D(y,{\hat {\mu }})=2\left(\log \left-\log \left\right).}

Here θ ^ 0 {\displaystyle {\hat {\theta }}_{0}} denotes the fitted values of the parameters in the model M0, while θ ^ s {\displaystyle {\hat {\theta }}_{s}} denotes the fitted parameters for the saturated model: both sets of fitted values are implicitly functions of the observations y. Here, the saturated model is a model with a parameter for every observation so that the data are fitted exactly. This expression is simply 2 times the log-likelihood ratio of the full model compared to the reduced model. The deviance is used to compare two models – in particular in the case of generalized linear models (GLM) where it has a similar role to residual sum of squares from ANOVA in linear models (RSS).

Suppose in the framework of the GLM, we have two nested models, M1 and M2. In particular, suppose that M1 contains the parameters in M2, and k additional parameters. Then, under the null hypothesis that M2 is the true model, the difference between the deviances for the two models follows, based on Wilks' theorem, an approximate chi-squared distribution with k-degrees of freedom. This can be used for hypothesis testing on the deviance.

Some usage of the term "deviance" can be confusing. According to Collett:

"the quantity 2 log [ p ( y θ ^ 0 ) ] {\displaystyle -2\log {\big }} is sometimes referred to as a deviance. This is inappropriate, since unlike the deviance used in the context of generalized linear modelling, 2 log [ p ( y θ ^ 0 ) ] {\displaystyle -2\log {\big }} does not measure deviation from a model that is a perfect fit to the data."

However, since the principal use is in the form of the difference of the deviances of two models, this confusion in definition is unimportant.

Examples

The unit deviance for the Poisson distribution is d ( y , μ ) = 2 ( y log y μ y + μ ) {\displaystyle d(y,\mu )=2\left(y\log {\frac {y}{\mu }}-y+\mu \right)} , the unit deviance for the normal distribution with unit variance is given by d ( y , μ ) = ( y μ ) 2 {\displaystyle d(y,\mu )=\left(y-\mu \right)^{2}} .

See also

Notes

  1. Hastie, Trevor. "A closer look at the deviance." The American Statistician 41.1 (1987): 16-20.
  2. Jørgensen, B. (1997). The Theory of Dispersion Models. Chapman & Hall.
  3. Song, Peter X.-K. (2007). Correlated Data Analysis: Modeling, Analytics, and Applications. Springer Series in Statistics. Springer Series in Statistics. doi:10.1007/978-0-387-71393-9. ISBN 978-0-387-71392-2.
  4. Nelder, J.A.; Wedderburn, R.W.M. (1972). "Generalized Linear Models". Journal of the Royal Statistical Society. Series A (General). 135 (3): 370–384. doi:10.2307/2344614. JSTOR 2344614. S2CID 14154576.
  5. ^ McCullagh and Nelder (1989): page 17
  6. Collett (2003): page 76

References

  • Collett, David (2003). Modelling Survival Data in Medical Research, Second Edition. Chapman & Hall/CRC. ISBN 1-58488-325-1.

External links

Statistics
Descriptive statistics
Continuous data
Center
Dispersion
Shape
Count data
Summary tables
Dependence
Graphics
Data collection
Study design
Survey methodology
Controlled experiments
Adaptive designs
Observational studies
Statistical inference
Statistical theory
Frequentist inference
Point estimation
Interval estimation
Testing hypotheses
Parametric tests
Specific tests
Goodness of fit
Rank statistics
Bayesian inference
Correlation
Regression analysis
Linear regression
Non-standard predictors
Generalized linear model
Partition of variance
Categorical / Multivariate / Time-series / Survival analysis
Categorical
Multivariate
Time-series
General
Specific tests
Time domain
Frequency domain
Survival
Survival function
Hazard function
Test
Applications
Biostatistics
Engineering statistics
Social statistics
Spatial statistics
Least squares and regression analysis
Computational statistics
Correlation and dependence
Regression analysis
Regression as a
statistical model
Linear regression
Predictor structure
Non-standard
Non-normal errors
Decomposition of variance
Model exploration
Background
Design of experiments
Numerical approximation
Applications
Categories: