Misplaced Pages

Fraction of variance unexplained

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
(Redirected from Statistical noise) For broader coverage of this topic, see Explained variation. Statistical noise
This article relies largely or entirely on a single source. Relevant discussion may be found on the talk page. Please help improve this article by introducing citations to additional sources.
Find sources: "Fraction of variance unexplained" – news · newspapers · books · scholar · JSTOR (June 2020)

In statistics, the fraction of variance unexplained (FVU) in the context of a regression task is the fraction of variance of the regressand (dependent variable) Y which cannot be explained, i.e., which is not correctly predicted, by the explanatory variables X.

Formal definition

Suppose we are given a regression function f {\displaystyle f} yielding for each y i {\displaystyle y_{i}} an estimate y ^ i = f ( x i ) {\displaystyle {\widehat {y}}_{i}=f(x_{i})} where x i {\displaystyle x_{i}} is the vector of the i observations on all the explanatory variables. We define the fraction of variance unexplained (FVU) as:

FVU = VAR err VAR tot = SS err / N SS tot / N = SS err SS tot ( = 1 SS reg SS tot ,  only true in some cases such as linear regression ) = 1 R 2 {\displaystyle {\begin{aligned}{\text{FVU}}&={{\text{VAR}}_{\text{err}} \over {\text{VAR}}_{\text{tot}}}={{\text{SS}}_{\text{err}}/N \over {\text{SS}}_{\text{tot}}/N}={{\text{SS}}_{\text{err}} \over {\text{SS}}_{\text{tot}}}\left(=1-{{\text{SS}}_{\text{reg}} \over {\text{SS}}_{\text{tot}}},{\text{ only true in some cases such as linear regression}}\right)\\&=1-R^{2}\end{aligned}}}

where R is the coefficient of determination and VARerr and VARtot are the variance of the residuals and the sample variance of the dependent variable. SSerr (the sum of squared predictions errors, equivalently the residual sum of squares), SStot (the total sum of squares), and SSreg (the sum of squares of the regression, equivalently the explained sum of squares) are given by

SS err = i = 1 N ( y i y ^ i ) 2 SS tot = i = 1 N ( y i y ¯ ) 2 SS reg = i = 1 N ( y ^ i y ¯ ) 2  and y ¯ = 1 N i = 1 N y i . {\displaystyle {\begin{aligned}{\text{SS}}_{\text{err}}&=\sum _{i=1}^{N}\;(y_{i}-{\widehat {y}}_{i})^{2}\\{\text{SS}}_{\text{tot}}&=\sum _{i=1}^{N}\;(y_{i}-{\bar {y}})^{2}\\{\text{SS}}_{\text{reg}}&=\sum _{i=1}^{N}\;({\widehat {y}}_{i}-{\bar {y}})^{2}{\text{ and}}\\{\bar {y}}&={\frac {1}{N}}\sum _{i=1}^{N}\;y_{i}.\end{aligned}}}

Alternatively, the fraction of variance unexplained can be defined as follows:

FVU = MSE ( f ) var [ Y ] {\displaystyle {\text{FVU}}={\frac {\operatorname {MSE} (f)}{\operatorname {var} }}}

where MSE(f) is the mean squared error of the regression function ƒ.

Explanation

It is useful to consider the second definition to understand FVU. When trying to predict Y, the most naive regression function that we can think of is the constant function predicting the mean of Y, i.e., f ( x i ) = y ¯ {\displaystyle f(x_{i})={\bar {y}}} . It follows that the MSE of this function equals the variance of Y; that is, SSerr = SStot, and SSreg = 0. In this case, no variation in Y can be accounted for, and the FVU then has its maximum value of 1.

More generally, the FVU will be 1 if the explanatory variables X tell us nothing about Y in the sense that the predicted values of Y do not covary with Y. But as prediction gets better and the MSE can be reduced, the FVU goes down. In the case of perfect prediction where y ^ i = y i {\displaystyle {\hat {y}}_{i}=y_{i}} for all i, the MSE is 0, SSerr = 0, SSreg = SStot, and the FVU is 0.

See also

References

  1. Achen, C. H. (1990). "'What Does "Explained Variance" Explain?: Reply". Political Analysis. 2 (1): 173–184. doi:10.1093/pan/2.1.173.
Categories: