Law of large numbers - Misplaced Pages

This is an old revision of this page, as edited by Pokipsy76 (talk | contribs) at 10:18, 6 March 2007 (the text was hidden behind the TOC!!). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Revision as of 10:18, 6 March 2007 by Pokipsy76 (talk | contribs) (the text was hidden behind the TOC!!)(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

The law of large numbers is a fundamental concept in probability and statistics that describes how the average of a randomly selected large sample from a population is likely to be close to the average of the whole population:

If an event of probability p is observed repeatedly during independent repetitions, the ratio of the observed frequency of that event to the total number of repetitions converges towards p as the number of repetitions becomes arbitrarily large.

More simply, as an experiment is repeated over and over, the observed probability approaches the actual probability. This is important because it means that if we do not know the probability of some natural event (say the chance that it will rain), we can discover that probability through observation and experimentation.

Origins of the term

Jacob Bernoulli first described the law of large numbers as so simple that even the stupidest men instinctively know it is true. Despite this, it took him over 20 years to develop a sufficiently rigorous mathematical proof which he published in Ars Conjectandi (The Art of Conjecturing) in 1713. He named this his "Golden Theorum" but it became generally known as "Bernoulli's Theorum". In 1835, S.D. Poisson further described it under the name "La loi de grands nombres" (The law of large numbers).. Thereafter, it was known under both names, but the "Law of large numbers" is most frequently used.

After Bernoulli and Poisson published their efforts, other mathematicians also contributed to refinement of the law, including Chebyshev, Markov, Borel, Cantelli, Kolmogorov, Vapnik and Chervonenkis. These further studies have given rise to two prominant forms of the law of large numbers. One is called the "weak" law and the other the "strong" law. These forms do not describe different laws but instead refer to different ways of describing the convergence of the sample mean with the population mean.

Probability

The law of large numbers is called "the first fundamental theorum of probability". It was derived by analysis of games of chance - the drawing of lots or the casting of dice which are governed by probability. For example, a fair, six sided die may come up "1","2","3","4","5" or "6" dots on a single throw and if these dots are counted as numbers, it is possible to calculate the value of an "average" roll.

We know that over many rolls, one roll in six will result in a "1". Likewise, one roll in six will result in "2" and so on through all 6 possible rolls. Counting the results as numbers gives:

{\frac {1}{6}}\times 1+{\frac {1}{6}}\times 2+{\frac {1}{6}}\times 3+{\frac {1}{6}}\times 4+{\frac {1}{6}}\times 5+{\frac {1}{6}}\times 6={\frac {1+2+3+4+5+6}{6}}={\frac {21}{6}}=3.5

Of course, there is no single side of the die that has 3.5 dots. and so, no single roll of the die will result in a value of "3.5". But after a large number of rolls are recorded, the average score of all rolls will approach 3.5.

Furthermore, with each roll of the die, a count of each time that a particular result occurs ("1", "2", "3", "4", "5" or "6") will increasingly approach 1/6 of the total number of rolls.

Misunderstanding this law may lead to the belief that if an event has not occurred in many trials, the probability of it occurring in a subsequent trial is increased. For example, the probability of a fair dice turning up a 3 is 1 in 6. LLN says that over a large number of throws, the observed frequency of 3s will be close to 1 in 6 (16 2/3%). This however does not mean that if the first 5 throws of the die do not turn up a 3, the sixth throw is more likely to produce a 3. Each roll is independent and the probability of rolling a 3 remains exactly the same from roll to roll and the value of any one individual observation cannot be predicted based upon past observations. Such erroneous predictions are known as the Gambler's Fallacy.

The "law of large numbers" is sometimes invoked to refer to the notion that even very improbable events may occur when a sufficiently large number of instances are given.

Statistics

The law of large numbers was derived through analysis of probability. Statistics evolve from probability theory and in statistics, the law of large numbers means that a large sample is more likely than a smaller sample to have the characteristics of the whole.

To illustrate, picture a water bottling plant producing 10,000 bottles of water a day. The plant manager measures the volume of water in a large number (say 200) of the bottles it produced that day, and finds that the average is .997 liters. In this case, the plant manager may conclude that the average of all bottles that day is not quite 1 liter.

Forms and Proofs

The weak law

The weak law of large numbers states that if X₁, X₂, X₃, ... is an infinite sequence of random variables, where all the random variables have the same expected value μ and variance σ; and are uncorrelated (i.e., the correlation between any two of them is zero), then the sample average

{\overline {X}}_{n}=(X_{1}+\cdots +X_{n})/n

converges in probability to μ.

Or, somewhat more tersely:

For any positive number ε, no matter how small, we have

\lim _{n\rightarrow \infty }\operatorname {P} \left(\left|{\overline {X}}_{n}-\mu \right|<\varepsilon \right)=1.

Proof

Chebyshev's inequality is used to prove this result. Finite variance $\operatorname {Var} (X_{i})=\sigma ^{2}$ (for all $i$ ) and no correlation yield that

\operatorname {Var} ({\overline {X}}_{n})={\frac {n\sigma ^{2}}{n^{2}}}={\frac {\sigma ^{2}}{n}}.

The common mean μ of the sequence is the mean of the sample average:

E({\overline {X}}_{n})=\mu .

Using Chebyshev's inequality on ${\overline {X}}_{n}$ results in

\operatorname {P} (\left|{\overline {X}}_{n}-\mu \right|\geq \varepsilon )\leq {\frac {\sigma ^{2}}{n\varepsilon ^{2}}}.

This may be used to obtain the following:

\operatorname {P} (\left|{\overline {X}}_{n}-\mu \right|<\varepsilon )=1-\operatorname {P} (\left|{\overline {X}}_{n}-\mu \right|\geq \varepsilon )\geq 1-{\frac {\sigma ^{2}}{\varepsilon ^{2}n}}.

As n approaches infinity, the expression approaches 1.

Proof ends here

The result holds also for the 'infinite variance' case, provided the $X_{i}$ are mutually independent and their (finite) mean μ exists.

A consequence of the weak law of large numbers is the asymptotic equipartition property.

The strong law

The strong law of large numbers states that if X₁, X₂, X₃, ... is an infinite sequence of random variables that are pairwise independent and identically distributed with

E(X_{i})=\mu \quad {\mbox{ and }}\quad E(|X_{i}|)<\infty ,

then

\operatorname {P} \left(\lim _{n\rightarrow \infty }{\overline {X}}_{n}=\mu \right)=1,

i.e., the sample average converges almost surely to μ.

If we replace the finite expectation condition with a finite second moment condition, E(X_i) < ∞ (which is the same as assuming that X_i has variance), then we obtain both almost sure convergence and convergence in mean square. In either case, these conditions also imply the consequent weak law of large numbers, since almost sure convergence implies convergence in probability (as, indeed, does convergence in mean square).

This law justifies the intuitive interpretation of the expected value of a random variable as the "long-term average when sampling repeatedly".

A weaker law and proof

Proofs of the above weak and strong laws of large numbers are rather involved. The consequent of the slightly weaker form below is implied by the weak law above (since convergence in distribution is implied by convergence in probability), but has a simpler proof.

Theorem. Let X₁, X₂, X₃, ... be a sequence of random variables, independent and identically distributed with common mean μ < ∞, and define the partial sum S_n := X₁ + X₂ + ... +X_n. Then, S_n / n converges in distribution to μ.

Proof. (See [1], p. 174) By Taylor's theorem for complex functions, the characteristic function of any random variable, X, with finite mean μ, can be written as

\varphi (t)=1+it\mu +o(t),\quad t\rightarrow 0.

Then, since the characteristic function of the sum of independent random variables is the product of their characteristic functions, the characteristic function of S_n / n is

\left^{n}=\left^{n}\,\rightarrow \,e^{it\mu },\quad {\textrm {as}}\quad n\rightarrow \infty .

The limit e is the characteristic function of the constant random variable μ, and hence by the Lévy continuity theorem, S_n / n converges in distribution to μ. Note that the proof of the central limit theorem, which tells us more about the convergence of the average to μ (when the variance σ is finite), follows a very similar approach.

References

Grimmett, G. R. and Stirzaker, D. R. (1992). Probability and Random Processes, 2nd Edition. Clarendon Press, Oxford. ISBN 0-19-853665-8.{{cite book}}: CS1 maint: multiple names: authors list (link)
Richard Durrett (1995). Probability: Theory and Examples, 2nd Edition. Duxbury Press.

External links

Jakob Bernoulli, Ars Conjectandi: Usum & Applicationem Praecedentis Doctrinae in Civilibus, Moralibus & Oeconomicis, 1713, Chapter 4,(Translated into English by Oscar Sheynin)
Hacking, Ian. (1983) "19th-century Cracks in the Concept of Determinism"

Categories: