Misplaced Pages

Law of large numbers: Difference between revisions

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Browse history interactively← Previous editContent deleted Content addedVisualWikitext
Revision as of 01:28, 9 March 2007 editDanielCD (talk | contribs)Extended confirmed users31,574 editsmNo edit summary← Previous edit Latest revision as of 02:23, 14 January 2025 edit undoLinearoperator (talk | contribs)7 editsm Clarification of language 
Line 1: Line 1:
{{Distinguish|Law of truly large numbers}}
{|align=right
|__TOC__
|}
The '''law of large numbers''' is a fundamental concept in ] and ] that describes how the ''']''' of a '''randomly selected large sample''' from a ] is likely to be close to the '''] of the whole population''':


{{Short description|Averages of repeated trials converge to the expected value}}
<blockquote>
If an event of probability ''p''
is observed repeatedly during independent repetitions,
the ratio of the observed frequency of that event to the total number of repetitions
] towards ''p''
as the number of repetitions becomes arbitrarily large.
</blockquote>


{{Probability fundamentals}}
More simply, as an experiment is repeated over and over, the observed probability approaches the actual probability. This is important because it means that if we do not know the probability of some natural event (say the chance that it will rain), we can discover that probability through observation and experimentation.


] of the law of large numbers using a particular run of rolls of a single ]. As the number of rolls in this run increases, the average of the values of all the results approaches 3.5. Although each run would show a distinctive shape over a small number of throws (at the left), over a large number of rolls (to the right) the shapes would be extremely similar.|thumb|right|286x286px]]
==Origins of the term==
] first described the law of large numbers as so simple that even the stupidest men instinctively know it is true. <ref> Jakob Bernoulli, ''Ars Conjectandi: Usum & Applicationem Praecedentis
Doctrinae in Civilibus, Moralibus & Oeconomicis'', 1713, Chapter 4,(Translated into English by Oscar Sheynin) </ref> Despite this, it took him over 20 years to develop a sufficiently rigorous mathematical proof which he published in ''Ars Conjectandi'' (The Art of Conjecturing) in 1713. He named this his "Golden Theorum" but it became generally known as "Bernoulli's Theorum". In 1835, ] further described it under the name "La loi de grands nombres" (The law of large numbers).<ref>Hacking, Ian. (1983) "19th-century Cracks in the Concept of Determinism"</ref>. Thereafter, it was known under both names, but the "Law of large numbers" is most frequently used.


In ], the '''law of large numbers''' ('''LLN''') is a ] that states that the ] of the results obtained from a large number of independent random samples converges to the true value, if it exists.<ref name=":0">{{Cite book|title=A Modern Introduction to Probability and Statistics| url=https://archive.org/details/modernintroducti00fmde|url-access=limited| last=Dekking|first=Michel| publisher=Springer| year=2005|isbn=9781852338961|pages=–190}}</ref> More formally, the LLN states that given a sample of independent and identically distributed values, the ] converges to the true ].
After Bernoulli and Poisson published their efforts, other mathematicians also contributed to refinement of the law, including ], ], ], ], ], ] and ]. These further studies have given rise to two prominant forms of the law of large numbers. One is called the "weak" law and the other the "strong" law. These forms do not describe different laws but instead refer to different ways of describing the ] of the sample mean with the population mean.


The LLN is important because it guarantees stable long-term results for the averages of some ] ].<ref name=":0" /><ref>{{Cite journal|last1=Yao|first1=Kai|last2=Gao|first2=Jinwu|date=2016|title=Law of Large Numbers for Uncertain Random Variables|journal=IEEE Transactions on Fuzzy Systems| volume=24| issue=3| pages=615–621| doi=10.1109/TFUZZ.2015.2466080| s2cid=2238905|issn=1063-6706}}</ref> For example, while a ] may lose ] in a single spin of the ] wheel, its earnings will tend towards a predictable percentage over a large number of spins. Any winning streak by a player will eventually be overcome by the parameters of the game. Importantly, the law applies (as the name indicates) only when a ''large number'' of observations are considered. There is no principle that a small number of observations will coincide with the expected value or that a streak of one value will immediately be "balanced" by the others (see the ]).
<!-- We need a discussion of the Uniform Law of Large Numbers also. -->
==Probability ==


The LLN only applies to the ''average'' of the results obtained from repeated trials and claims that this average converges to the expected value; it does not claim that the ''sum'' of ''n'' results gets close to the expected value times ''n'' as ''n'' increases.
The law of large numbers is called "the first fundamental theorum of probability". It was derived by analysis of games of chance - the drawing of lots or the casting of dice which are governed by probability. For example, a fair, six sided die may come up "1","2","3","4","5" or "6" dots on a single throw and if these dots are counted as numbers, it is possible to calculate the value of an "average" roll.


Throughout its history, many mathematicians have refined this law. Today, the LLN is used in many fields including statistics, probability theory, economics, and insurance.<ref name=":1">{{Cite web |last=Sedor |first=Kelly |title=The Law of Large Numbers and its Applications |url=https://www.lakeheadu.ca/sites/default/files/uploads/77/images/Sedor%20Kelly.pdf}}</ref>
We know that over many rolls, one roll in six will result in a "1". Likewise, one roll in six will result in "2" and so on through all 6 possible rolls. Counting the results as numbers gives:


==Examples==
:::<math>\frac{1}{6} \times 1 + \frac{1}{6} \times 2 + \frac{1}{6} \times 3 +\frac{1}{6} \times 4 +\frac{1}{6} \times 5 +\frac{1}{6} \times 6 = \frac{1+2+3+4+5+6}{6}=\frac{21}{6}= 3.5 </math>
For example, a single roll of a six-sided die produces one of the numbers 1, 2, 3, 4, 5, or 6, each with equal ]. Therefore, the ] of the roll is:


<math display="block"> \frac{1+2+3+4+5+6}{6} = 3.5</math>


According to the law of large numbers, if a large number of six-sided dice are rolled, the average of their values (sometimes called the ]) will approach 3.5, with the precision increasing as more dice are rolled.
Of course, there is no single side of the die that has 3.5 dots. and so, no single roll of the die will result in a value of "3.5". But after a large number of rolls are recorded, the average score of all rolls will approach 3.5.


It follows from the law of large numbers that the ] of success in a series of ]s will converge to the theoretical probability. For a ], the expected value is the theoretical probability of success, and the average of ''n'' such variables (assuming they are ]) is precisely the relative frequency.
Furthermore, with each roll of the die, a count of each time that a particular result occurs ("1", "2", "3", "4", "5" or "6") will increasingly approach 1/6 of the total number of rolls.
]
For example, a ] toss is a Bernoulli trial. When a fair coin is flipped once, the theoretical probability that the outcome will be heads is equal to {{frac|1|2}}. Therefore, according to the law of large numbers, the proportion of heads in a "large" number of coin flips "should be" roughly {{frac|1|2}}. In particular, the proportion of heads after ''n'' flips will ] ] to {{frac|1|2}} as ''n'' approaches infinity.


Although the proportion of heads (and tails) approaches {{frac|1|2}}, almost surely the ] in the number of heads and tails will become large as the number of flips becomes large. That is, the probability that the absolute difference is a small number approaches zero as the number of flips becomes large. Also, almost surely the ratio of the absolute difference to the number of flips will approach zero. Intuitively, the expected difference grows, but at a slower rate than the number of flips.
Misunderstanding this law may lead to the belief that if an event has not occurred in many trials, the probability of it occurring in a subsequent trial is increased. For example, the probability of a fair dice turning up a 3 is 1 in 6. LLN says that over a large number of throws, the observed frequency of 3s will be close to 1 in 6 (16 2/3%). This however does not mean that if the first 5 throws of the die do not turn up a 3, the sixth throw is more likely to produce a 3. Each roll is independent and the probability of rolling a 3 remains exactly the same from roll to roll and the value of any one individual observation cannot be predicted based upon past observations. Such erroneous predictions are known as the ].


Another good example of the LLN is the ]. These methods are a broad class of ]al ]s that rely on repeated ] to obtain numerical results. The larger the number of repetitions, the better the approximation tends to be. The reason that this method is important is mainly that, sometimes, it is difficult or impossible to use other approaches.<ref>{{Cite journal|last1=Kroese|first1=Dirk P.| last2=Brereton|first2=Tim| last3=Taimre|first3=Thomas|last4=Botev|first4=Zdravko I.|date=2014|title=Why the Monte Carlo method is so important today|journal=Wiley Interdisciplinary Reviews: Computational Statistics| language=en| volume=6| issue=6|pages=386–392|doi=10.1002/wics.1314|s2cid=18521840}}</ref>
The "law of large numbers" is sometimes invoked to refer to the notion that even very improbable events may occur when a sufficiently large number of instances are given.


==Statistics== == Limitation ==
The average of the results obtained from a large number of trials may fail to converge in some cases. For instance, the average of ''n'' results taken from the ] or some ]s (α<1) will not converge as ''n'' becomes larger; the reason is ].<ref>{{Cite book |title=A modern introduction to probability and statistics: understanding why and how |date=2005 |publisher=Springer |isbn=978-1-85233-896-1 |editor-last=Dekking |editor-first=Michel |series=Springer texts in statistics |location=London |pages=187}}</ref> The Cauchy distribution and the Pareto distribution represent two cases: the Cauchy distribution does not have an expectation,<ref>{{Cite book|title=A Modern Introduction to Probability and Statistics|url=https://archive.org/details/modernintroducti00fmde|url-access=limited| last=Dekking|first=Michel|publisher=Springer|year=2005|isbn=9781852338961|pages=}}</ref> whereas the expectation of the Pareto distribution (''α''<1) is infinite.<ref>{{Cite book|title=A Modern Introduction to Probability and Statistics|url=https://archive.org/details/modernintroducti00fmde| url-access=limited| last=Dekking|first=Michel| publisher=Springer| year=2005| isbn=9781852338961| pages=}}</ref> One way to generate the Cauchy-distributed example is where the random numbers equal the ] of an angle uniformly distributed between −90° and +90°.<ref>{{Cite journal |last1=Pitman |first1=E. J. G. |last2=Williams |first2=E. J. |date=1967 |title=Cauchy-Distributed Functions of Cauchy Variates |journal=The Annals of Mathematical Statistics |volume=38 |issue=3 |pages=916–918 |doi=10.1214/aoms/1177698885 |jstor=2239008 |issn=0003-4851|doi-access=free }}</ref> The ] is zero, but the expected value does not exist, and indeed the average of ''n'' such variables have the same distribution as one such variable. It does not converge in probability toward zero (or any other value) as ''n'' goes to infinity.
The law of large numbers was derived through analysis of probability. Statistics evolve from probability theory and in statistics, the law of large numbers means that a large sample is more likely than a smaller sample to have the characteristics of the whole.


And if the trials embed a ], typical in human economic/rational behaviour, the law of large numbers does not help in solving the bias. Even if the number of trials is increased the selection bias remains.
To illustrate, picture a water bottling plant producing 10,000 bottles of water a day. The plant manager measures the volume of water in a large number (say 200) of the bottles it produced that day, and finds that the average is .997 liters. In this case, the plant manager may conclude that the average of all bottles that day is not quite 1 liter.


==History==
==Forms and Proofs ==
] is an example of the law of large numbers. Initially, there are ] molecules on the left side of a barrier (magenta line) and none on the right. The barrier is removed, and the solute diffuses to fill the whole container.{{ubl|style=margin-top:1em|
===The weak law===
''Top:'' With a single molecule, the motion appears to be quite random.
The '''weak law of large numbers''' states that if ''X''<sub>1</sub>, ''X''<sub>2</sub>, ''X''<sub>3</sub>, ... is an infinite ] of ]s, where all the random variables have the same ] μ and ] σ<sup>2</sup>; and are ] (i.e., the ] between any two of them is zero), then the sample average
|''Middle:'' With more molecules, there is clearly a trend where the solute fills the container more and more uniformly, but there are also random fluctuations.
|''Bottom:'' With an enormous number of solute molecules (too many to see), the randomness is essentially gone: The solute appears to move smoothly and systematically from high-concentration areas to low-concentration areas. In realistic situations, chemists can describe diffusion as a deterministic macroscopic phenomenon (see ]s), despite its underlying random nature.}}]]


The Italian mathematician ] (1501–1576) stated without proof that the accuracies of empirical statistics tend to improve with the number of trials.<ref>{{cite book |last=Mlodinow |first=L. |title=The Drunkard's Walk |location=New York |publisher=Random House |year=2008 |page=50}}</ref><ref name=":1" /> This was then formalized as a law of large numbers. A special form of the LLN (for a binary random variable) was first proved by ].<ref>{{cite book |first=Jakob |last=Bernoulli |title=Ars Conjectandi: Usum & Applicationem Praecedentis Doctrinae in Civilibus, Moralibus & Oeconomicis |language=la |year=1713 |chapter=4 |translator-first=Oscar |translator-last=Sheynin}}</ref><ref name=":1" /> It took him over 20 years to develop a sufficiently rigorous mathematical proof which was published in his {{lang|la|italic=yes|]}} (''The Art of Conjecturing'') in 1713. He named this his "Golden Theorem" but it became generally known as "'''Bernoulli's theorem'''". This should not be confused with ], named after Jacob Bernoulli's nephew ]. In 1837, ] further described it under the name {{lang|fr|"la loi des grands nombres"}} ("the law of large numbers").<ref>Poisson names the "law of large numbers" ({{lang|fr|la loi des grands nombres}}) in: {{cite book |first=S. D. |last=Poisson |title=Probabilité des jugements en matière criminelle et en matière civile, précédées des règles générales du calcul des probabilitiés |location=Paris, France |publisher=Bachelier |year=1837 |page= |language=fr}} He attempts a two-part proof of the law on pp. 139–143 and pp. 277 ff.</ref><ref>{{cite journal |last=Hacking |first=Ian |year=1983 |title=19th-century Cracks in the Concept of Determinism |journal=Journal of the History of Ideas |volume=44 |issue=3 |pages=455–475 |doi=10.2307/2709176 |jstor=2709176}}</ref><ref name=":1" /> Thereafter, it was known under both names, but the "law of large numbers" is most frequently used.
:<math>\overline{X}_n=(X_1+\cdots+X_n)/n</math>


After Bernoulli and Poisson published their efforts, other mathematicians also contributed to refinement of the law, including ],<ref>{{Cite journal | last1 = Tchebichef | first1 = P. | title = Démonstration élémentaire d'une proposition générale de la théorie des probabilités | doi = 10.1515/crll.1846.33.259 | journal = Journal für die reine und angewandte Mathematik | volume = 1846 | issue = 33 | pages = 259–267 | year = 1846 | s2cid = 120850863 | url = https://zenodo.org/record/1448850 |language=fr}}</ref> ], ], ], ] and ].<ref name=":1" /> Markov showed that the law can apply to a random variable that does not have a finite variance under some other weaker assumption, and Khinchin showed in 1929 that if the series consists of independent identically distributed random variables, it suffices that the ] exists for the weak law of large numbers to be true.{{sfn|Seneta|2013}}<ref name=EncMath>{{cite web| author1=Yuri Prohorov|author-link1=Yuri Vasilyevich Prokhorov|title=Law of large numbers| url=https://www.encyclopediaofmath.org/index.php/Law_of_large_numbers| website=Encyclopedia of Mathematics |publisher=EMS Press}}</ref> These further studies have given rise to two prominent forms of the LLN. One is called the "weak" law and the other the "strong" law, in reference to two different modes of ] of the cumulative sample means to the expected value; in particular, as explained below, the strong form implies the weak.{{sfn|Seneta|2013}}
converges in probability to μ.


==Forms==
Or, somewhat more tersely:
There are two different versions of the '''law of large numbers''' that are described below. They are called the'' '''strong law''' of large numbers'' and the '''''weak law''' of large numbers''.<ref>{{Cite book|title=A Course in Mathematical Statistics and Large Sample Theory| last1=Bhattacharya|first1=Rabi| last2=Lin|first2=Lizhen| last3=Patrangenaru|first3=Victor| date=2016| publisher=Springer New York| isbn=978-1-4939-4030-1| series=Springer Texts in Statistics| location=New York, NY| doi=10.1007/978-1-4939-4032-5}}</ref><ref name=":0" /> Stated for the case where ''X''<sub>1</sub>, ''X''<sub>2</sub>, ... is an infinite sequence of ] ] random variables with expected value E(''X''<sub>1</sub>) = E(''X''<sub>2</sub>) = ... = ''μ'', both versions of the law state that the sample average


<math display="block">\overline{X}_n=\frac1n(X_1+\cdots+X_n) </math>
For any positive number ε, no matter how small, we have


converges to the expected value:
:<math>\lim_{n\rightarrow\infty}\operatorname{P}\left(\left|\overline{X}_n-\mu\right|<\varepsilon\right)=1.</math>
{{NumBlk||<math display="block">\overline{X}_n \to \mu \quad\textrm{as}\ n \to \infty.</math>|{{EquationRef|1}}}}


(Lebesgue integrability of ''X<sub>j</sub>'' means that the expected value E(''X<sub>j</sub>'') exists according to Lebesgue integration and is finite. It does ''not'' mean that the associated probability measure is ] with respect to ].)
==== Proof ====

] is used to prove this result. Finite variance <math> \operatorname{Var} (X_i)=\sigma^2 </math> (for all <math>i</math>) and no correlation yield that
Introductory probability texts often additionally assume identical finite ] <math> \operatorname{Var} (X_i) = \sigma^2 </math> (for all <math>i</math>) and no correlation between random variables. In that case, the variance of the average of n random variables is
:<math>

\operatorname{Var}(\overline{X}_n) = \frac{n\sigma^2}{n^2} = \frac{\sigma^2}{n}.
<math display="block">\operatorname{Var}(\overline{X}_n) = \operatorname{Var}(\tfrac1n(X_1+\cdots+X_n)) = \frac{1}{n^2} \operatorname{Var}(X_1+\cdots+X_n) = \frac{n\sigma^2}{n^2} = \frac{\sigma^2}{n}.</math>

which can be used to shorten and simplify the proofs. This assumption of finite ] is ''not necessary''. Large or infinite variance will make the convergence slower, but the LLN holds anyway.<ref name="TaoBlog">{{cite web|title=The strong law of large numbers – What's new|date=19 June 2008|url=http://terrytao.wordpress.com/2008/06/18/the-strong-law-of-large-numbers/|access-date=2012-06-09|publisher=Terrytao.wordpress.com}}</ref>

] of the random variables can be replaced by ]<ref>{{cite journal|last1=Etemadi|first1=N. Z.|date=1981|title=An elementary proof of the strong law of large numbers|journal=Wahrscheinlichkeitstheorie Verw Gebiete| volume=55| issue=1| pages=119–122| doi=10.1007/BF01013465|s2cid=122166046|doi-access=free}}</ref> or ]<ref>{{Cite journal| last=Kingman|first=J. F. C.|date=April 1978|title=Uses of Exchangeability|journal=The Annals of Probability| language=en| volume=6|issue=2|doi=10.1214/aop/1176995566|issn=0091-1798|doi-access=free}}</ref> in both versions of the law.

The difference between the strong and the weak version is concerned with the mode of convergence being asserted. For interpretation of these modes, see ].

===Weak law===
{{multiple image |width1=50 |image1=Blank300.png
|width2=100 |image2=Lawoflargenumbersanimation2.gif |footer=Simulation illustrating the law of large numbers. Each frame, a coin that is red on one side and blue on the other is flipped, and a dot is added in the corresponding column. A pie chart shows the proportion of red and blue so far. Notice that while the proportion varies significantly at first, it approaches 50% as the number of trials increases.
|width3=50 |image3=Blank300.png}}
The '''weak law of large numbers''' (also called ]'s law) states that given a collection of ] (iid) samples from a random variable with finite mean, the sample mean ] to the expected value<ref>{{harvnb|Loève|1977|loc=Chapter 1.4, p. 14}}</ref>
{{NumBlk||<math display="block">
\overline{X}_n\ \overset{P}{\rightarrow}\ \mu \qquad\textrm{when}\ n \to \infty.
</math>|{{EquationRef|2}}}}

That is, for any positive number ''ε'',

<math display="block">
\lim_{n\to\infty}\Pr\!\left(\,|\overline{X}_n-\mu| < \varepsilon\,\right) = 1.
</math>

Interpreting this result, the weak law states that for any nonzero margin specified (''ε''), no matter how small, with a sufficiently large sample there will be a very high probability that the average of the observations will be close to the expected value; that is, within the margin.

As mentioned earlier, the weak law applies in the case of i.i.d. random variables, but it also applies in some other cases. For example, the variance may be different for each random variable in the series, keeping the expected value constant. If the variances are bounded, then the law applies, as shown by ] as early as 1867. (If the expected values change during the series, then we can simply apply the law to the average deviation from the respective expected values. The law then states that this converges in probability to zero.) In fact, Chebyshev's proof works so long as the variance of the average of the first ''n'' values goes to zero as ''n'' goes to infinity.<ref name=EncMath/> As an example, assume that each random variable in the series follows a ] (normal distribution) with mean zero, but with variance equal to <math>2n/\log(n+1)</math>, which is not bounded. At each stage, the average will be normally distributed (as the average of a set of normally distributed variables). The variance of the sum is equal to the sum of the variances, which is ] to <math>n^2 / \log n</math>. The variance of the average is therefore asymptotic to <math>1 / \log n</math> and goes to zero.

There are also examples of the weak law applying even though the expected value does not exist.

===Strong law===
The '''strong law of large numbers''' (also called ]'s law) states that the sample average ] to the expected value<ref>{{harvnb|Loève|1977|loc=Chapter 17.3, p. 251}}</ref>
{{NumBlk||<math display="block">
\overline{X}_n\ \overset{\text{a.s.}}{\longrightarrow}\ \mu \qquad\textrm{when}\ n \to \infty.
</math>|{{EquationRef|3}}}}

That is,

<math display="block">
\Pr\!\left( \lim_{n\to\infty}\overline{X}_n = \mu \right) = 1.
</math>

What this means is that the probability that, as the number of trials ''n'' goes to infinity, the average of the observations converges to the expected value, is equal to one. The modern proof of the strong law is more complex than that of the weak law, and relies on passing to an appropriate subsequence.<ref name="TaoBlog" />

The strong law of large numbers can itself be seen as a special case of the ]. This view justifies the intuitive interpretation of the expected value (for Lebesgue integration only) of a random variable when sampled repeatedly as the "long-term average".

Law 3 is called the strong law because random variables which converge strongly (almost surely) are guaranteed to converge weakly (in probability). However the weak law is known to hold in certain conditions where the strong law does not hold and then the convergence is only weak (in probability). See ].

The strong law applies to independent identically distributed random variables having an expected value (like the weak law). This was proved by Kolmogorov in 1930. It can also apply in other cases. Kolmogorov also showed, in 1933, that if the variables are independent and identically distributed, then for the average to converge almost surely on ''something'' (this can be considered another statement of the strong law), it is necessary that they have an expected value (and then of course the average will converge almost surely on that).<ref name=EMStrong>{{cite web|author1=Yuri Prokhorov| title=Strong law of large numbers|url=https://www.encyclopediaofmath.org/index.php/Strong_law_of_large_numbers| website=Encyclopedia of Mathematics}}</ref>

If the summands are independent but not identically distributed, then
{{NumBlk||<math display="block">
\overline{X}_n - \operatorname{E}\big\ \overset{\text{a.s.}}{\longrightarrow}\ 0,
</math>|{{EquationRef|2}}}}

provided that each ''X''<sub>''k''</sub> has a finite second moment and

<math display="block">
\sum_{k=1}^{\infty} \frac{1}{k^2} \operatorname{Var} < \infty.
</math>

This statement is known as ''Kolmogorov's strong law'', see e.g. {{harvtxt|Sen|Singer|1993|loc=Theorem 2.3.10}}.

===Differences between the weak law and the strong law===
The ''weak law'' states that for a specified large ''n'', the average <math style="vertical-align:-.35em">\overline{X}_n</math> is likely to be near ''μ''.<ref>{{Cite web |title=What Is the Law of Large Numbers? (Definition) {{!}} Built In |url=https://builtin.com/data-science/law-of-large-numbers |access-date=2023-10-20 |website=builtin.com |language=en}}</ref> Thus, it leaves open the possibility that <math style="vertical-align:-.4em">|\overline{X}_n -\mu| > \varepsilon</math> happens an infinite number of times, although at infrequent intervals. (Not necessarily <math style="vertical-align:-.4em">|\overline{X}_n -\mu| \neq 0</math> for all ''n'').

The ''strong law'' shows that this ] will not occur. It does not imply that with probability 1, we have that for any {{math|''ε'' > 0}} the inequality <math style="vertical-align:-.4em">|\overline{X}_n -\mu| < \varepsilon</math> holds for all large enough ''n'', since the convergence is not necessarily uniform on the set where it holds.<ref>{{harvtxt|Ross|2009}}</ref>

The strong law does not hold in the following cases, but the weak law does.<ref name="Weak law converges to constant">{{cite book |last1=Lehmann |first1=Erich L. |last2=Romano |first2=Joseph P. |date=2006-03-30 |title=Weak law converges to constant |publisher=Springer |isbn=9780387276052 |url=https://books.google.com/books?id=K6t5qn-SEp8C&pg=PA432}}</ref><ref>{{cite journal| title=A Note on the Weak Law of Large Numbers for Exchangeable Random Variables |author1=Dguvl Hun Hong |author2=Sung Ho Lee |url=http://www.mathnet.or.kr/mathnet/kms_tex/31810.pdf |journal=Communications of the Korean Mathematical Society| volume=13|year=1998|issue=2|pages=385–391 |access-date=2014-06-28|archive-url=https://web.archive.org/web/20160701234328/http://www.mathnet.or.kr/mathnet/kms_tex/31810.pdf|archive-date=2016-07-01|url-status=dead}}</ref><!-- Stack Exchange is not a reliable source -->

{{ordered list
|1= Let X be an ] distributed random variable with parameter 1. The random variable <math>\sin(X)e^X X^{-1}</math> has no expected value according to Lebesgue integration, but using conditional convergence and interpreting the integral as a ], which is an improper ], we can say:

<math display="block"> E\left(\frac{\sin(X)e^X}{X}\right) =\ \int_{x=0}^{\infty}\frac{\sin(x)e^x}{x}e^{-x}dx = \frac{\pi}{2} </math>

|2= Let X be a ] distributed random variable with probability 0.5. The random variable <math>2^X(-1)^X X^{-1}</math> does not have an expected value in the conventional sense because the infinite ] is not absolutely convergent, but using conditional convergence, we can say:

<math display="block"> E\left(\frac{2^X(-1)^X}{X}\right) =\ \sum_{x=1}^{\infty}\frac{2^x(-1)^x}{x}2^{-x}=-\ln(2) </math>

|3= If the ] of a random variable is

<math display="block">\begin{cases}
1-F(x)&=\frac{e}{2x\ln(x)},&x \ge e \\
F(x)&=\frac{e}{-2x\ln(-x)},&x \le -e
\end{cases}</math>

then it has no expected value, but the weak law is true.<ref>{{cite web|last1=Mukherjee|first1=Sayan|title=Law of large numbers| url=http://www.isds.duke.edu/courses/Fall09/sta205/lec/lln.pdf|access-date=2014-06-28|archive-url=https://web.archive.org/web/20130309032810/http://www.isds.duke.edu/courses/Fall09/sta205/lec/lln.pdf|archive-date=2013-03-09| url-status=dead}}</ref><ref>{{cite web|last1=J. Geyer|first1=Charles|title=Law of large numbers| url=http://www.stat.umn.edu/geyer/8112/notes/weaklaw.pdf}}</ref>

|4= Let ''X''<sub>''k''</sub> be plus or minus <math display="inline">\sqrt{k/\log\log\log k}</math> (starting at sufficiently large ''k'' so that the denominator is positive) with probability {{frac|1|2}} for each.<ref name=EMStrong/> The variance of ''X''<sub>''k''</sub> is then <math>k/\log\log\log k.</math> Kolmogorov's strong law does not apply because the partial sum in his criterion up to ''k''&nbsp;=&nbsp;''n'' is asymptotic to <math>\log n/\log\log\log n</math> and this is unbounded. If we replace the random variables with Gaussian variables having the same variances, namely <math display="inline">\sqrt{k/\log\log\log k}</math>, then the average at any point will also be normally distributed. The width of the distribution of the average will tend toward zero (standard deviation asymptotic to <math display="inline">1/\sqrt{2\log\log\log n}</math>), but for a given ''ε'', there is probability which does not go to zero with ''n'', while the average sometime after the ''n''th trial will come back up to ''ε''. Since the width of the distribution of the average is not zero, it must have a positive lower bound ''p''(''ε''), which means there is a probability of at least ''p''(''ε'') that the average will attain ε after ''n'' trials. It will happen with probability ''p''(''ε'')/2 before some ''m'' which depends on ''n''. But even after ''m'', there is still a probability of at least ''p''(''ε'') that it will happen. (This seems to indicate that ''p''(''ε'')=1 and the average will attain ε an infinite number of times.)
}}

===Uniform laws of large numbers===
There are extensions of the law of large numbers to collections of estimators, where the convergence is uniform over the collection; thus the name ''uniform law of large numbers''.

Suppose ''f''(''x'',''θ'') is some ] defined for ''θ'' ∈ Θ, and continuous in ''θ''. Then for any fixed ''θ'', the sequence {''f''(''X''<sub>1</sub>,''θ''), ''f''(''X''<sub>2</sub>,''θ''), ...} will be a sequence of independent and identically distributed random variables, such that the sample mean of this sequence converges in probability to E. This is the ''pointwise'' (in ''θ'') convergence.

A particular example of a '''uniform law of large numbers''' states the conditions under which the convergence happens ''uniformly'' in ''θ''. If<ref>{{harvnb|Newey|McFadden|1994|loc=Lemma 2.4}}</ref><ref>{{cite journal|doi=10.1214/aoms/1177697731|title=Asymptotic Properties of Non-Linear Least Squares Estimators|year=1969|last1=Jennrich|first1=Robert I.|journal=The Annals of Mathematical Statistics|volume=40|issue=2|pages=633–643|doi-access=free}}</ref>

# ''Θ'' is compact,
# ''f''(''x'',''θ'') is continuous at each ''θ'' ∈ Θ for ] ''x''s, and measurable function of ''x'' at each ''θ''.
# there exists a ] function ''d''(''x'') such that E < ∞, and <math display="block"> \left\| f(x,\theta) \right\| \leq d(x) \quad\text{for all}\ \theta\in\Theta.</math>

Then E is continuous in ''θ'', and

<math display="block">
\sup_{\theta\in\Theta} \left\| \frac 1 n \sum_{i=1}^n f(X_i,\theta) - \operatorname{E} \right\| \overset{\mathrm{P}}{\rightarrow} \ 0.
</math>

This result is useful to derive consistency of a large class of estimators (see ]).

===Borel's law of large numbers===
'''Borel's law of large numbers''', named after ], states that if an experiment is repeated a large number of times, independently under identical conditions, then the proportion of times that any specified event is expected to occur approximately equals the probability of the event's occurrence on any particular trial; the larger the number of repetitions, the better the approximation tends to be. More precisely, if ''E'' denotes the event in question, ''p'' its probability of occurrence, and ''N<sub>n</sub>''(''E'') the number of times ''E'' occurs in the first ''n'' trials, then with probability one,<ref>{{cite journal | url=https://www.jstor.org/stable/2323947 | jstor=2323947 | doi=10.2307/2323947 | last1=Wen | first1=Liu | title=An Analytic Technique to Prove Borel's Strong Law of Large Numbers | journal=The American Mathematical Monthly | date=1991 | volume=98 | issue=2 | pages=146–148 }}</ref>
<math display="block"> \frac{N_n(E)}{n}\to p\text{ as }n\to\infty.</math>

This theorem makes rigorous the intuitive notion of probability as the expected long-run relative frequency of an event's occurrence. It is a special case of any of several more general laws of large numbers in probability theory.

''']'''. Let ''X'' be a ] with finite ] ''μ'' and finite non-zero ] ''σ''<sup>2</sup>. Then for any ] {{math|''k'' > 0}},

<math display="block">
\Pr(|X-\mu|\geq k\sigma) \leq \frac{1}{k^2}.
</math>

==Proof of the weak law==
Given ''X''<sub>1</sub>, ''X''<sub>2</sub>, ... an infinite sequence of ] random variables with finite expected value <math>E(X_1)=E(X_2)=\cdots=\mu<\infty</math>, we are interested in the convergence of the sample average

<math display="block">\overline{X}_n=\tfrac1n(X_1+\cdots+X_n). </math>

The weak law of large numbers states:
{{NumBlk||<math display="block">
\overline{X}_n\ \overset{P}{\rightarrow}\ \mu \qquad\textrm{when}\ n \to \infty.
</math>|{{EquationRef|2}}}}

===Proof using Chebyshev's inequality assuming finite variance===
This proof uses the assumption of finite ] <math> \operatorname{Var} (X_i)=\sigma^2 </math> (for all <math>i</math>). The independence of the random variables implies no correlation between them, and we have that

<math display="block">
\operatorname{Var}(\overline{X}_n) = \operatorname{Var}(\tfrac1n(X_1+\cdots+X_n)) = \frac{1}{n^2} \operatorname{Var}(X_1+\cdots+X_n) = \frac{n\sigma^2}{n^2} = \frac{\sigma^2}{n}.
</math> </math>


The common mean μ of the sequence is the mean of the sample average: The common mean μ of the sequence is the mean of the sample average:


:<math> <math display="block">
E(\overline{X}_n) = \mu. E(\overline{X}_n) = \mu.
</math> </math>
Line 71: Line 203:
Using ] on <math>\overline{X}_n </math> results in Using ] on <math>\overline{X}_n </math> results in


:<math> <math display="block">
\operatorname{P}( \left| \overline{X}_n-\mu \right| \geq \varepsilon) \leq \frac{\sigma^2}{{n\varepsilon^2}}. \operatorname{P}( \left| \overline{X}_n-\mu \right| \geq \varepsilon) \leq \frac{\sigma^2}{n\varepsilon^2}.
</math> </math>


This may be used to obtain the following: This may be used to obtain the following:


:<math> <math display="block">
\operatorname{P}( \left| \overline{X}_n-\mu \right| < \varepsilon) = 1 - \operatorname{P}( \left| \overline{X}_n-\mu \right| \geq \varepsilon) \geq 1 - \frac{\sigma^2}{\varepsilon^2 n}. \operatorname{P}( \left| \overline{X}_n-\mu \right| < \varepsilon) = 1 - \operatorname{P}( \left| \overline{X}_n-\mu \right| \geq \varepsilon) \geq 1 - \frac{\sigma^2}{n \varepsilon^2 }.
</math> </math>


As ''n'' approaches infinity, the expression approaches 1. As ''n'' approaches infinity, the expression approaches 1. And by definition of ], we have obtained
{{NumBlk||<math display="block">
\overline{X}_n\ \overset{P}{\rightarrow}\ \mu \qquad\textrm{when}\ n \to \infty.
</math>|{{EquationRef|2}}}}


===Proof using convergence of characteristic functions===
'''''Proof ends here'''''
By ] for ]s, the ] of any random variable, ''X'', with finite mean μ, can be written as


<math display="block">\varphi_X(t) = 1 + it\mu + o(t), \quad t \rightarrow 0.</math>
The result holds also for the 'infinite variance' case, provided the <math> X_i </math> are mutually independent and their (finite) mean μ exists.


All ''X''<sub>1</sub>, ''X''<sub>2</sub>, ... have the same characteristic function, so we will simply denote this ''φ''<sub>''X''</sub>.
A consequence of the weak law of large numbers is the ].


Among the basic properties of characteristic functions there are
===The strong law===
The '''strong law of large numbers''' states that if ''X''<sub>1</sub>, ''X''<sub>2</sub>, ''X''<sub>3</sub>, ... is an infinite sequence of random variables that are ] and identically distributed with
:<math>E(X_i) = \mu\quad\mbox{ and }\quad E(|X_i|) < \infty,</math>
then
:<math>\operatorname{P}\left(\lim_{n\rightarrow\infty}\overline{X}_n=\mu\right)=1,</math>


<math display="block">\varphi_{\frac 1 n X}(t)= \varphi_X(\tfrac t n) \quad \text{and} \quad
i.e., the sample average ] ] to μ.
\varphi_{X+Y}(t) = \varphi_X(t) \varphi_Y(t) \quad </math> if ''X'' and ''Y'' are independent.


These rules can be used to calculate the characteristic function of <math>\overline{X}_n</math> in terms of ''φ''<sub>''X''</sub>:
If we replace the finite expectation condition with a finite second ] condition, &nbsp;E(''X''<sub>i</sub><sup>2</sup>) < ∞ (which is the same as assuming that ''X''<sub>i</sub> has variance), then we obtain both almost sure convergence and ]. In either case, these conditions also imply the consequent weak law of large numbers, since almost sure convergence implies convergence in probability (as, indeed, does convergence in mean square).


<math display="block">\varphi_{\overline{X}_n}(t)= \left^n = \left^n \, \rightarrow \, e^{it\mu}, \quad \text{as} \quad n \to \infty.</math>
This law justifies the intuitive interpretation of the expected value of a random variable as the "long-term average when sampling repeatedly".


The limit ''e''<sup>''itμ''</sup> is the characteristic function of the constant random variable μ, and hence by the ], <math> \overline{X}_n</math> ] to μ:
===A weaker law and proof===
Proofs of the above weak and strong laws of large numbers are rather involved. The consequent of the slightly weaker form below is implied by the weak law above (since convergence in distribution is implied by convergence in probability), but has a simpler proof.


<math display="block">\overline{X}_n \, \overset{\mathcal D}{\rightarrow} \, \mu \qquad\text{for}\qquad n \to \infty.</math>
'''Theorem.''' Let ''X''<sub>1</sub>, ''X''<sub>2</sub>, ''X''<sub>3</sub>, ... be a sequence of random variables, independent and identically distributed with common mean μ < ∞, and define the partial sum ''S''<sub>''n''</sub> := ''X''<sub>1</sub> + ''X''<sub>2</sub> + ... +''X''<sub>''n''</sub>. Then, &nbsp;''S''<sub>''n''</sub>&nbsp;/&nbsp;''n'' ] to μ.


μ is a constant, which implies that convergence in distribution to μ and convergence in probability to μ are equivalent (see ].) Therefore,
'''Proof.''' (See ], p. 174) By ] for ]s, the ] of any random variable, ''X'', with finite mean μ, can be written as
{{NumBlk||<math display="block">
\overline{X}_n\ \overset{P}{\rightarrow}\ \mu \qquad\textrm{when}\ n \to \infty.
</math>|{{EquationRef|2}}}}


This shows that the sample mean converges in probability to the derivative of the characteristic function at the origin, as long as the latter exists.
:<math>\varphi(t) = 1 + it\mu + o(t), \quad t \rightarrow 0.</math>


==Proof of the strong law==
Then, since the characteristic function of the sum of independent random variables is the product of their characteristic functions, the characteristic function of &nbsp;''S''<sub>''n''</sub> / ''n''&nbsp; is
We give a relatively simple proof of the strong law under the assumptions that the <math>X_i</math> are ], <math> {\mathbb E} =: \mu < \infty </math>, <math> \operatorname{Var} (X_i)=\sigma^2 < \infty</math>, and <math> {\mathbb E} =: \tau < \infty </math>.


Let us first note that without loss of generality we can assume that <math>\mu = 0</math> by centering. In this case, the strong law says that
:<math>\left^n = \left^n \, \rightarrow \, e^{it\mu}, \quad \textrm{as} \quad n \rightarrow \infty.</math>


<math display="block">
The limit &nbsp;''e''<sup>''it''μ</sup>&nbsp; is the characteristic function of the constant random variable μ, and hence by the ], &nbsp;''S''<sub>''n''</sub>&nbsp;/&nbsp;''n'' converges in distribution to μ. Note that the ], which tells us more about the convergence of the average to μ (when the variance σ<sup> 2 </sup> is finite), follows a very similar approach.
\Pr\!\left( \lim_{n\to\infty}\overline{X}_n = 0 \right) = 1,
</math>
or
<math display="block">
\Pr\left(\omega: \lim_{n\to\infty}\frac{S_n(\omega)}n = 0 \right) = 1.
</math>
It is equivalent to show that
<math display="block">
\Pr\left(\omega: \lim_{n\to\infty}\frac{S_n(\omega)}n \neq 0 \right) = 0,
</math>
Note that
<math display="block">
\lim_{n\to\infty}\frac{S_n(\omega)}n \neq 0 \iff \exists\epsilon>0, \left|\frac{S_n(\omega)}n\right| \ge \epsilon\ \mbox{infinitely often},
</math>
and thus to prove the strong law we need to show that for every <math>\epsilon > 0</math>, we have
<math display="block">
\Pr\left(\omega: |S_n(\omega)| \ge n\epsilon \mbox{ infinitely often} \right) = 0.
</math>
Define the events <math> A_n = \{\omega : |S_n| \ge n\epsilon\}</math>, and if we can show that
<math display="block">
\sum_{n=1}^\infty \Pr(A_n) <\infty,
</math>
then the Borel-Cantelli Lemma implies the result. So let us estimate <math>\Pr(A_n)</math>.


We compute
==References==
<math display="block">
<references/>
{\mathbb E} = {\mathbb E}\left = {\mathbb E}\left.
*{{cite book | author=Grimmett, G. R. and Stirzaker, D. R. | title=Probability and Random Processes, 2nd Edition | publisher=Clarendon Press, Oxford | year=1992 | id=ISBN 0-19-853665-8}}
</math>
*{{cite book | author=Richard Durrett | title=Probability: Theory and Examples, 2nd Edition | publisher=Duxbury Press | year=1995}}
We first claim that every term of the form <math>X_i^3X_j, X_i^2X_jX_k, X_iX_jX_kX_l</math> where all subscripts are distinct, must have zero expectation. This is because <math>{\mathbb E} = {\mathbb E}{\mathbb E}</math> by independence, and the last term is zero --- and similarly for the other terms. Therefore the only terms in the sum with nonzero expectation are <math>{\mathbb E}</math> and <math>{\mathbb E}</math>. Since the <math>X_i</math> are identically distributed, all of these are the same, and moreover <math>{\mathbb E}=({\mathbb E})^2</math>.

There are <math>n</math> terms of the form <math>{\mathbb E}</math> and <math>3 n (n-1)</math> terms of the form <math>({\mathbb E})^2</math>, and so
<math display="block">
{\mathbb E} = n \tau + 3n(n-1)\sigma^4.
</math>
Note that the right-hand side is a quadratic polynomial in <math>n</math>, and as such there exists a <math>C>0</math> such that <math> {\mathbb E} \le Cn^2</math> for <math>n</math> sufficiently large. By Markov,
<math display="block">
\Pr(|S_n| \ge n \epsilon) \le \frac1{(n\epsilon)^4}{\mathbb E} \le \frac{C}{\epsilon^4 n^2},
</math>
for <math>n</math> sufficiently large, and therefore this series is summable. Since this holds for any <math>\epsilon > 0</math>, we have established the Strong LLN.


Another proof was given by Etemadi.<ref>{{cite journal |last1=Etemadi |first1=Nasrollah |title=An elementary proof of the strong law of large numbers |journal=Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete |date=1981 |volume=55 |pages=119–122 |publisher=Springer|doi=10.1007/BF01013465 |s2cid=122166046 |doi-access=free }}</ref>

For a proof without the added assumption of a finite fourth moment, see Section 22 of Billingsley.<ref>{{cite book|last = Billingsley | first = Patrick| title = Probability and Measure|date = 1979}}</ref>

== Consequences ==
The law of large numbers provides an expectation of an unknown distribution from a realization of the sequence, but also any feature of the ].<ref name=":0" /> By applying ], one could easily obtain the probability mass function. For each event in the objective probability mass function, one could approximate the probability of the event's occurrence with the proportion of times that any specified event occurs. The larger the number of repetitions, the better the approximation. As for the continuous case: <math>C=(a-h,a+h]</math>, for small positive h. Thus, for large n:

<math display="block"> \frac{N_n(C)}{n}\thickapprox
p = P(X\in C) = \int_{a-h}^{a+h} f(x) \, dx
\thickapprox
2hf(a)</math>

With this method, one can cover the whole x-axis with a grid (with grid size 2h) and obtain a bar graph which is called a ].

== Applications ==
One application of the LLN is an important method of approximation known as the ],<ref name=":1" /> which uses a random sampling of numbers to approximate numerical results. The algorithm to compute an integral of f(x) on an interval is as follows:<ref name=":1" />

# Simulate uniform random variables X<sub>1</sub>, X<sub>2</sub>, ..., X<sub>n</sub> which can be done using a software, and use a random number table that gives U<sub>1</sub>, U<sub>2</sub>, ..., U<sub>n</sub> independent and identically distributed (i.i.d.) random variables on . Then let X<sub>i</sub> = a+(b - a)U<sub>i</sub> for i= 1, 2, ..., n. Then X<sub>1</sub>, X<sub>2</sub>, ..., X<sub>n</sub> are independent and identically distributed uniform random variables on .
# Evaluate f(X<sub>1</sub>), f(X<sub>2</sub>), ..., f(X<sub>n</sub>)
# Take the average of f(X<sub>1</sub>), f(X<sub>2</sub>), ..., f(X<sub>n</sub>) by computing <math>(b-a)\tfrac{f(X_1)+f(X_2)+...+f(X_n)}{n}</math> and then by the Strong Law of Large Numbers, this converges to <math>(b-a)E(f(X_1))</math> = <math>(b-a)\int_{a}^{b} f(x)\tfrac{1}{b-a}{dx}</math> =<math>\int_{a}^{b} f(x){dx}</math>

We can find the integral of <math>f(x) = cos^2(x)\sqrt{x^3+1}</math> on . Using traditional methods to compute this integral is very difficult, so the Monte Carlo method can be used here.<ref name=":1" /> Using the above algorithm, we get

<math>\int_{-1}^{2} f(x){dx}</math> = 0.905 when n=25

and

<math>\int_{-1}^{2} f(x){dx}</math> = 1.028 when n=250

We observe that as n increases, the numerical value also increases. When we get the actual results for the integral we get

<math>\int_{-1}^{2} f(x){dx}</math> = 1.000194

When the LLN was used, the approximation of the integral was closer to its true value, and thus more accurate.<ref name=":1" />

Another example is the integration of <big>f(x) =</big> <math>\frac{e^x-1}{e-1}</math> on .<ref name=":2">{{Citation |last=Reiter |first=Detlev |title=The Monte Carlo Method, an Introduction |date=2008 |url=http://link.springer.com/10.1007/978-3-540-74686-7_3 |work=Computational Many-Particle Physics |series=Lecture Notes in Physics |volume=739 |pages=63–78 |editor-last=Fehske |editor-first=H. |access-date=2023-12-08 |place=Berlin, Heidelberg |publisher=Springer Berlin Heidelberg |language=en |doi=10.1007/978-3-540-74686-7_3 |isbn=978-3-540-74685-0 |editor2-last=Schneider |editor2-first=R. |editor3-last=Weiße |editor3-first=A.}}</ref> Using the Monte Carlo method and the LLN, we can see that as the number of samples increases, the numerical value gets closer to 0.4180233.<ref name=":2" />


==See also== ==See also==
* ]
* ]
* ]
* ]
* ]
* ] * ]
* ]
* ]
* ]
* ]
* ]
* ]

==Notes==
{{Reflist|2}}

==References==
{{refbegin}}
* {{cite book |last1=Grimmett |first1=G. R. |last2=Stirzaker |first2=D. R. | title=Probability and Random Processes |edition=2nd | publisher=Clarendon Press |location=Oxford | year=1992 | isbn=0-19-853665-8}}
* {{cite book | first=Richard |last=Durrett | title=Probability: Theory and Examples |edition=2nd | publisher=Duxbury Press | year=1995}}
* {{cite book | author=Martin Jacobsen | publisher= HCØ-tryk |location=Copenhagen | year=1992|title=Videregående Sandsynlighedsregning |language=da |trans-title=Advanced Probability Theory |edition=3rd | isbn=87-91180-71-6}}
* {{cite book
| last = Loève | first = Michel
| title = Probability theory 1
| year = 1977
| edition = 4th
| publisher = Springer
}}
* {{cite book
| last1 = Newey | first1 = Whitney K.
| last2 = McFadden | first2 = Daniel | author-link2 = Daniel McFadden
| title = Large sample estimation and hypothesis testing
| series = Handbook of econometrics |volume=IV |chapter=36
| year = 1994
| publisher = Elsevier Science
| pages = 2111–2245
}}
* {{cite book
| last = Ross | first = Sheldon
| title = A first course in probability
| year = 2009
| edition = 8th
| publisher = Prentice Hall
| isbn = 978-0-13-603313-4
}}
* {{cite book
| last1 = Sen | first1 = P. K
| last2 = Singer | first2 = J. M.
| year = 1993
| title = Large sample methods in statistics
| publisher = Chapman & Hall
}}
* {{cite journal|author1-link=Eugene Seneta|last=Seneta|first=Eugene|title=A Tricentenary history of the Law of Large Numbers| journal=Bernoulli| volume=19| issue=4| pages=1088–1121| date=2013|doi=10.3150/12-BEJSP12|arxiv=1309.6488|s2cid=88520834}}


{{refend}}
== External links ==
*
*


==External links==
]
* {{springer|title=Law of large numbers|id=p/l057720}}
]
* {{MathWorld|urlname=WeakLawofLargeNumbers|title=Weak Law of Large Numbers}}
* {{MathWorld|urlname=StrongLawofLargeNumbers|title=Strong Law of Large Numbers}}
* by Yihui Xie using the ] package
* . "We don't believe in such laws as laws of large numbers. This is sort of, uh, old dogma, I think, that was cooked up by somebody " said Tim Cook and while: "However, the law of large numbers has nothing to do with large companies, large revenues, or large growth rates. The law of large numbers is a fundamental concept in probability theory and statistics, tying together theoretical probabilities that we can calculate to the actual outcomes of experiments that we empirically perform.'' explained ]''
{{Authority control}}


]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]

Latest revision as of 02:23, 14 January 2025

Not to be confused with Law of truly large numbers. Averages of repeated trials converge to the expected value
Part of a series on statistics
Probability theory
An illustration of the law of large numbers using a particular run of rolls of a single die. As the number of rolls in this run increases, the average of the values of all the results approaches 3.5. Although each run would show a distinctive shape over a small number of throws (at the left), over a large number of rolls (to the right) the shapes would be extremely similar.

In probability theory, the law of large numbers (LLN) is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the LLN states that given a sample of independent and identically distributed values, the sample mean converges to the true mean.

The LLN is important because it guarantees stable long-term results for the averages of some random events. For example, while a casino may lose money in a single spin of the roulette wheel, its earnings will tend towards a predictable percentage over a large number of spins. Any winning streak by a player will eventually be overcome by the parameters of the game. Importantly, the law applies (as the name indicates) only when a large number of observations are considered. There is no principle that a small number of observations will coincide with the expected value or that a streak of one value will immediately be "balanced" by the others (see the gambler's fallacy).

The LLN only applies to the average of the results obtained from repeated trials and claims that this average converges to the expected value; it does not claim that the sum of n results gets close to the expected value times n as n increases.

Throughout its history, many mathematicians have refined this law. Today, the LLN is used in many fields including statistics, probability theory, economics, and insurance.

Examples

For example, a single roll of a six-sided die produces one of the numbers 1, 2, 3, 4, 5, or 6, each with equal probability. Therefore, the expected value of the roll is:

1 + 2 + 3 + 4 + 5 + 6 6 = 3.5 {\displaystyle {\frac {1+2+3+4+5+6}{6}}=3.5}

According to the law of large numbers, if a large number of six-sided dice are rolled, the average of their values (sometimes called the sample mean) will approach 3.5, with the precision increasing as more dice are rolled.

It follows from the law of large numbers that the empirical probability of success in a series of Bernoulli trials will converge to the theoretical probability. For a Bernoulli random variable, the expected value is the theoretical probability of success, and the average of n such variables (assuming they are independent and identically distributed (i.i.d.)) is precisely the relative frequency.

This image illustrates the convergence of relative frequencies to their theoretical probabilities. The probability of picking a red ball from a sack is 0.4 and black ball is 0.6. The left plot shows the relative frequency of picking a black ball, and the right plot shows the relative frequency of picking a red ball, both over 10,000 trials. As the number of trials increases, the relative frequencies approach their respective theoretical probabilities, demonstrating the Law of Large Numbers.

For example, a fair coin toss is a Bernoulli trial. When a fair coin is flipped once, the theoretical probability that the outcome will be heads is equal to 1⁄2. Therefore, according to the law of large numbers, the proportion of heads in a "large" number of coin flips "should be" roughly 1⁄2. In particular, the proportion of heads after n flips will almost surely converge to 1⁄2 as n approaches infinity.

Although the proportion of heads (and tails) approaches 1⁄2, almost surely the absolute difference in the number of heads and tails will become large as the number of flips becomes large. That is, the probability that the absolute difference is a small number approaches zero as the number of flips becomes large. Also, almost surely the ratio of the absolute difference to the number of flips will approach zero. Intuitively, the expected difference grows, but at a slower rate than the number of flips.

Another good example of the LLN is the Monte Carlo method. These methods are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The larger the number of repetitions, the better the approximation tends to be. The reason that this method is important is mainly that, sometimes, it is difficult or impossible to use other approaches.

Limitation

The average of the results obtained from a large number of trials may fail to converge in some cases. For instance, the average of n results taken from the Cauchy distribution or some Pareto distributions (α<1) will not converge as n becomes larger; the reason is heavy tails. The Cauchy distribution and the Pareto distribution represent two cases: the Cauchy distribution does not have an expectation, whereas the expectation of the Pareto distribution (α<1) is infinite. One way to generate the Cauchy-distributed example is where the random numbers equal the tangent of an angle uniformly distributed between −90° and +90°. The median is zero, but the expected value does not exist, and indeed the average of n such variables have the same distribution as one such variable. It does not converge in probability toward zero (or any other value) as n goes to infinity.

And if the trials embed a selection bias, typical in human economic/rational behaviour, the law of large numbers does not help in solving the bias. Even if the number of trials is increased the selection bias remains.

History

Diffusion is an example of the law of large numbers. Initially, there are solute molecules on the left side of a barrier (magenta line) and none on the right. The barrier is removed, and the solute diffuses to fill the whole container.
  • Top: With a single molecule, the motion appears to be quite random.
  • Middle: With more molecules, there is clearly a trend where the solute fills the container more and more uniformly, but there are also random fluctuations.
  • Bottom: With an enormous number of solute molecules (too many to see), the randomness is essentially gone: The solute appears to move smoothly and systematically from high-concentration areas to low-concentration areas. In realistic situations, chemists can describe diffusion as a deterministic macroscopic phenomenon (see Fick's laws), despite its underlying random nature.

The Italian mathematician Gerolamo Cardano (1501–1576) stated without proof that the accuracies of empirical statistics tend to improve with the number of trials. This was then formalized as a law of large numbers. A special form of the LLN (for a binary random variable) was first proved by Jacob Bernoulli. It took him over 20 years to develop a sufficiently rigorous mathematical proof which was published in his Ars Conjectandi (The Art of Conjecturing) in 1713. He named this his "Golden Theorem" but it became generally known as "Bernoulli's theorem". This should not be confused with Bernoulli's principle, named after Jacob Bernoulli's nephew Daniel Bernoulli. In 1837, S. D. Poisson further described it under the name "la loi des grands nombres" ("the law of large numbers"). Thereafter, it was known under both names, but the "law of large numbers" is most frequently used.

After Bernoulli and Poisson published their efforts, other mathematicians also contributed to refinement of the law, including Chebyshev, Markov, Borel, Cantelli, Kolmogorov and Khinchin. Markov showed that the law can apply to a random variable that does not have a finite variance under some other weaker assumption, and Khinchin showed in 1929 that if the series consists of independent identically distributed random variables, it suffices that the expected value exists for the weak law of large numbers to be true. These further studies have given rise to two prominent forms of the LLN. One is called the "weak" law and the other the "strong" law, in reference to two different modes of convergence of the cumulative sample means to the expected value; in particular, as explained below, the strong form implies the weak.

Forms

There are two different versions of the law of large numbers that are described below. They are called the strong law of large numbers and the weak law of large numbers. Stated for the case where X1, X2, ... is an infinite sequence of independent and identically distributed (i.i.d.) Lebesgue integrable random variables with expected value E(X1) = E(X2) = ... = μ, both versions of the law state that the sample average

X ¯ n = 1 n ( X 1 + + X n ) {\displaystyle {\overline {X}}_{n}={\frac {1}{n}}(X_{1}+\cdots +X_{n})}

converges to the expected value:

X ¯ n μ as   n . {\displaystyle {\overline {X}}_{n}\to \mu \quad {\textrm {as}}\ n\to \infty .} 1

(Lebesgue integrability of Xj means that the expected value E(Xj) exists according to Lebesgue integration and is finite. It does not mean that the associated probability measure is absolutely continuous with respect to Lebesgue measure.)

Introductory probability texts often additionally assume identical finite variance Var ( X i ) = σ 2 {\displaystyle \operatorname {Var} (X_{i})=\sigma ^{2}} (for all i {\displaystyle i} ) and no correlation between random variables. In that case, the variance of the average of n random variables is

Var ( X ¯ n ) = Var ( 1 n ( X 1 + + X n ) ) = 1 n 2 Var ( X 1 + + X n ) = n σ 2 n 2 = σ 2 n . {\displaystyle \operatorname {Var} ({\overline {X}}_{n})=\operatorname {Var} ({\tfrac {1}{n}}(X_{1}+\cdots +X_{n}))={\frac {1}{n^{2}}}\operatorname {Var} (X_{1}+\cdots +X_{n})={\frac {n\sigma ^{2}}{n^{2}}}={\frac {\sigma ^{2}}{n}}.}

which can be used to shorten and simplify the proofs. This assumption of finite variance is not necessary. Large or infinite variance will make the convergence slower, but the LLN holds anyway.

Mutual independence of the random variables can be replaced by pairwise independence or exchangeability in both versions of the law.

The difference between the strong and the weak version is concerned with the mode of convergence being asserted. For interpretation of these modes, see Convergence of random variables.

Weak law

Simulation illustrating the law of large numbers. Each frame, a coin that is red on one side and blue on the other is flipped, and a dot is added in the corresponding column. A pie chart shows the proportion of red and blue so far. Notice that while the proportion varies significantly at first, it approaches 50% as the number of trials increases.

The weak law of large numbers (also called Khinchin's law) states that given a collection of independent and identically distributed (iid) samples from a random variable with finite mean, the sample mean converges in probability to the expected value

X ¯ n   P   μ when   n . {\displaystyle {\overline {X}}_{n}\ {\overset {P}{\rightarrow }}\ \mu \qquad {\textrm {when}}\ n\to \infty .} 2

That is, for any positive number ε,

lim n Pr ( | X ¯ n μ | < ε ) = 1. {\displaystyle \lim _{n\to \infty }\Pr \!\left(\,|{\overline {X}}_{n}-\mu |<\varepsilon \,\right)=1.}

Interpreting this result, the weak law states that for any nonzero margin specified (ε), no matter how small, with a sufficiently large sample there will be a very high probability that the average of the observations will be close to the expected value; that is, within the margin.

As mentioned earlier, the weak law applies in the case of i.i.d. random variables, but it also applies in some other cases. For example, the variance may be different for each random variable in the series, keeping the expected value constant. If the variances are bounded, then the law applies, as shown by Chebyshev as early as 1867. (If the expected values change during the series, then we can simply apply the law to the average deviation from the respective expected values. The law then states that this converges in probability to zero.) In fact, Chebyshev's proof works so long as the variance of the average of the first n values goes to zero as n goes to infinity. As an example, assume that each random variable in the series follows a Gaussian distribution (normal distribution) with mean zero, but with variance equal to 2 n / log ( n + 1 ) {\displaystyle 2n/\log(n+1)} , which is not bounded. At each stage, the average will be normally distributed (as the average of a set of normally distributed variables). The variance of the sum is equal to the sum of the variances, which is asymptotic to n 2 / log n {\displaystyle n^{2}/\log n} . The variance of the average is therefore asymptotic to 1 / log n {\displaystyle 1/\log n} and goes to zero.

There are also examples of the weak law applying even though the expected value does not exist.

Strong law

The strong law of large numbers (also called Kolmogorov's law) states that the sample average converges almost surely to the expected value

X ¯ n   a.s.   μ when   n . {\displaystyle {\overline {X}}_{n}\ {\overset {\text{a.s.}}{\longrightarrow }}\ \mu \qquad {\textrm {when}}\ n\to \infty .} 3

That is,

Pr ( lim n X ¯ n = μ ) = 1. {\displaystyle \Pr \!\left(\lim _{n\to \infty }{\overline {X}}_{n}=\mu \right)=1.}

What this means is that the probability that, as the number of trials n goes to infinity, the average of the observations converges to the expected value, is equal to one. The modern proof of the strong law is more complex than that of the weak law, and relies on passing to an appropriate subsequence.

The strong law of large numbers can itself be seen as a special case of the pointwise ergodic theorem. This view justifies the intuitive interpretation of the expected value (for Lebesgue integration only) of a random variable when sampled repeatedly as the "long-term average".

Law 3 is called the strong law because random variables which converge strongly (almost surely) are guaranteed to converge weakly (in probability). However the weak law is known to hold in certain conditions where the strong law does not hold and then the convergence is only weak (in probability). See differences between the weak law and the strong law.

The strong law applies to independent identically distributed random variables having an expected value (like the weak law). This was proved by Kolmogorov in 1930. It can also apply in other cases. Kolmogorov also showed, in 1933, that if the variables are independent and identically distributed, then for the average to converge almost surely on something (this can be considered another statement of the strong law), it is necessary that they have an expected value (and then of course the average will converge almost surely on that).

If the summands are independent but not identically distributed, then

X ¯ n E [ X ¯ n ]   a.s.   0 , {\displaystyle {\overline {X}}_{n}-\operatorname {E} {\big }\ {\overset {\text{a.s.}}{\longrightarrow }}\ 0,} 2

provided that each Xk has a finite second moment and

k = 1 1 k 2 Var [ X k ] < . {\displaystyle \sum _{k=1}^{\infty }{\frac {1}{k^{2}}}\operatorname {Var} <\infty .}

This statement is known as Kolmogorov's strong law, see e.g. Sen & Singer (1993, Theorem 2.3.10).

Differences between the weak law and the strong law

The weak law states that for a specified large n, the average X ¯ n {\displaystyle {\overline {X}}_{n}} is likely to be near μ. Thus, it leaves open the possibility that | X ¯ n μ | > ε {\displaystyle |{\overline {X}}_{n}-\mu |>\varepsilon } happens an infinite number of times, although at infrequent intervals. (Not necessarily | X ¯ n μ | 0 {\displaystyle |{\overline {X}}_{n}-\mu |\neq 0} for all n).

The strong law shows that this almost surely will not occur. It does not imply that with probability 1, we have that for any ε > 0 the inequality | X ¯ n μ | < ε {\displaystyle |{\overline {X}}_{n}-\mu |<\varepsilon } holds for all large enough n, since the convergence is not necessarily uniform on the set where it holds.

The strong law does not hold in the following cases, but the weak law does.

  1. Let X be an exponentially distributed random variable with parameter 1. The random variable sin ( X ) e X X 1 {\displaystyle \sin(X)e^{X}X^{-1}} has no expected value according to Lebesgue integration, but using conditional convergence and interpreting the integral as a Dirichlet integral, which is an improper Riemann integral, we can say: E ( sin ( X ) e X X ) =   x = 0 sin ( x ) e x x e x d x = π 2 {\displaystyle E\left({\frac {\sin(X)e^{X}}{X}}\right)=\ \int _{x=0}^{\infty }{\frac {\sin(x)e^{x}}{x}}e^{-x}dx={\frac {\pi }{2}}}
  2. Let X be a geometrically distributed random variable with probability 0.5. The random variable 2 X ( 1 ) X X 1 {\displaystyle 2^{X}(-1)^{X}X^{-1}} does not have an expected value in the conventional sense because the infinite series is not absolutely convergent, but using conditional convergence, we can say: E ( 2 X ( 1 ) X X ) =   x = 1 2 x ( 1 ) x x 2 x = ln ( 2 ) {\displaystyle E\left({\frac {2^{X}(-1)^{X}}{X}}\right)=\ \sum _{x=1}^{\infty }{\frac {2^{x}(-1)^{x}}{x}}2^{-x}=-\ln(2)}
  3. If the cumulative distribution function of a random variable is { 1 F ( x ) = e 2 x ln ( x ) , x e F ( x ) = e 2 x ln ( x ) , x e {\displaystyle {\begin{cases}1-F(x)&={\frac {e}{2x\ln(x)}},&x\geq e\\F(x)&={\frac {e}{-2x\ln(-x)}},&x\leq -e\end{cases}}} then it has no expected value, but the weak law is true.
  4. Let Xk be plus or minus k / log log log k {\textstyle {\sqrt {k/\log \log \log k}}} (starting at sufficiently large k so that the denominator is positive) with probability 1⁄2 for each. The variance of Xk is then k / log log log k . {\displaystyle k/\log \log \log k.} Kolmogorov's strong law does not apply because the partial sum in his criterion up to k = n is asymptotic to log n / log log log n {\displaystyle \log n/\log \log \log n} and this is unbounded. If we replace the random variables with Gaussian variables having the same variances, namely k / log log log k {\textstyle {\sqrt {k/\log \log \log k}}} , then the average at any point will also be normally distributed. The width of the distribution of the average will tend toward zero (standard deviation asymptotic to 1 / 2 log log log n {\textstyle 1/{\sqrt {2\log \log \log n}}} ), but for a given ε, there is probability which does not go to zero with n, while the average sometime after the nth trial will come back up to ε. Since the width of the distribution of the average is not zero, it must have a positive lower bound p(ε), which means there is a probability of at least p(ε) that the average will attain ε after n trials. It will happen with probability p(ε)/2 before some m which depends on n. But even after m, there is still a probability of at least p(ε) that it will happen. (This seems to indicate that p(ε)=1 and the average will attain ε an infinite number of times.)

Uniform laws of large numbers

There are extensions of the law of large numbers to collections of estimators, where the convergence is uniform over the collection; thus the name uniform law of large numbers.

Suppose f(x,θ) is some function defined for θ ∈ Θ, and continuous in θ. Then for any fixed θ, the sequence {f(X1,θ), f(X2,θ), ...} will be a sequence of independent and identically distributed random variables, such that the sample mean of this sequence converges in probability to E. This is the pointwise (in θ) convergence.

A particular example of a uniform law of large numbers states the conditions under which the convergence happens uniformly in θ. If

  1. Θ is compact,
  2. f(x,θ) is continuous at each θ ∈ Θ for almost all xs, and measurable function of x at each θ.
  3. there exists a dominating function d(x) such that E < ∞, and f ( x , θ ) d ( x ) for all   θ Θ . {\displaystyle \left\|f(x,\theta )\right\|\leq d(x)\quad {\text{for all}}\ \theta \in \Theta .}

Then E is continuous in θ, and

sup θ Θ 1 n i = 1 n f ( X i , θ ) E [ f ( X , θ ) ] P   0. {\displaystyle \sup _{\theta \in \Theta }\left\|{\frac {1}{n}}\sum _{i=1}^{n}f(X_{i},\theta )-\operatorname {E} \right\|{\overset {\mathrm {P} }{\rightarrow }}\ 0.}

This result is useful to derive consistency of a large class of estimators (see Extremum estimator).

Borel's law of large numbers

Borel's law of large numbers, named after Émile Borel, states that if an experiment is repeated a large number of times, independently under identical conditions, then the proportion of times that any specified event is expected to occur approximately equals the probability of the event's occurrence on any particular trial; the larger the number of repetitions, the better the approximation tends to be. More precisely, if E denotes the event in question, p its probability of occurrence, and Nn(E) the number of times E occurs in the first n trials, then with probability one, N n ( E ) n p  as  n . {\displaystyle {\frac {N_{n}(E)}{n}}\to p{\text{ as }}n\to \infty .}

This theorem makes rigorous the intuitive notion of probability as the expected long-run relative frequency of an event's occurrence. It is a special case of any of several more general laws of large numbers in probability theory.

Chebyshev's inequality. Let X be a random variable with finite expected value μ and finite non-zero variance σ. Then for any real number k > 0,

Pr ( | X μ | k σ ) 1 k 2 . {\displaystyle \Pr(|X-\mu |\geq k\sigma )\leq {\frac {1}{k^{2}}}.}

Proof of the weak law

Given X1, X2, ... an infinite sequence of i.i.d. random variables with finite expected value E ( X 1 ) = E ( X 2 ) = = μ < {\displaystyle E(X_{1})=E(X_{2})=\cdots =\mu <\infty } , we are interested in the convergence of the sample average

X ¯ n = 1 n ( X 1 + + X n ) . {\displaystyle {\overline {X}}_{n}={\tfrac {1}{n}}(X_{1}+\cdots +X_{n}).}

The weak law of large numbers states:

X ¯ n   P   μ when   n . {\displaystyle {\overline {X}}_{n}\ {\overset {P}{\rightarrow }}\ \mu \qquad {\textrm {when}}\ n\to \infty .} 2

Proof using Chebyshev's inequality assuming finite variance

This proof uses the assumption of finite variance Var ( X i ) = σ 2 {\displaystyle \operatorname {Var} (X_{i})=\sigma ^{2}} (for all i {\displaystyle i} ). The independence of the random variables implies no correlation between them, and we have that

Var ( X ¯ n ) = Var ( 1 n ( X 1 + + X n ) ) = 1 n 2 Var ( X 1 + + X n ) = n σ 2 n 2 = σ 2 n . {\displaystyle \operatorname {Var} ({\overline {X}}_{n})=\operatorname {Var} ({\tfrac {1}{n}}(X_{1}+\cdots +X_{n}))={\frac {1}{n^{2}}}\operatorname {Var} (X_{1}+\cdots +X_{n})={\frac {n\sigma ^{2}}{n^{2}}}={\frac {\sigma ^{2}}{n}}.}

The common mean μ of the sequence is the mean of the sample average:

E ( X ¯ n ) = μ . {\displaystyle E({\overline {X}}_{n})=\mu .}

Using Chebyshev's inequality on X ¯ n {\displaystyle {\overline {X}}_{n}} results in

P ( | X ¯ n μ | ε ) σ 2 n ε 2 . {\displaystyle \operatorname {P} (\left|{\overline {X}}_{n}-\mu \right|\geq \varepsilon )\leq {\frac {\sigma ^{2}}{n\varepsilon ^{2}}}.}

This may be used to obtain the following:

P ( | X ¯ n μ | < ε ) = 1 P ( | X ¯ n μ | ε ) 1 σ 2 n ε 2 . {\displaystyle \operatorname {P} (\left|{\overline {X}}_{n}-\mu \right|<\varepsilon )=1-\operatorname {P} (\left|{\overline {X}}_{n}-\mu \right|\geq \varepsilon )\geq 1-{\frac {\sigma ^{2}}{n\varepsilon ^{2}}}.}

As n approaches infinity, the expression approaches 1. And by definition of convergence in probability, we have obtained

X ¯ n   P   μ when   n . {\displaystyle {\overline {X}}_{n}\ {\overset {P}{\rightarrow }}\ \mu \qquad {\textrm {when}}\ n\to \infty .} 2

Proof using convergence of characteristic functions

By Taylor's theorem for complex functions, the characteristic function of any random variable, X, with finite mean μ, can be written as

φ X ( t ) = 1 + i t μ + o ( t ) , t 0. {\displaystyle \varphi _{X}(t)=1+it\mu +o(t),\quad t\rightarrow 0.}

All X1, X2, ... have the same characteristic function, so we will simply denote this φX.

Among the basic properties of characteristic functions there are

φ 1 n X ( t ) = φ X ( t n ) and φ X + Y ( t ) = φ X ( t ) φ Y ( t ) {\displaystyle \varphi _{{\frac {1}{n}}X}(t)=\varphi _{X}({\tfrac {t}{n}})\quad {\text{and}}\quad \varphi _{X+Y}(t)=\varphi _{X}(t)\varphi _{Y}(t)\quad } if X and Y are independent.

These rules can be used to calculate the characteristic function of X ¯ n {\displaystyle {\overline {X}}_{n}} in terms of φX:

φ X ¯ n ( t ) = [ φ X ( t n ) ] n = [ 1 + i μ t n + o ( t n ) ] n e i t μ , as n . {\displaystyle \varphi _{{\overline {X}}_{n}}(t)=\left^{n}=\left^{n}\,\rightarrow \,e^{it\mu },\quad {\text{as}}\quad n\to \infty .}

The limit e is the characteristic function of the constant random variable μ, and hence by the Lévy continuity theorem, X ¯ n {\displaystyle {\overline {X}}_{n}} converges in distribution to μ:

X ¯ n D μ for n . {\displaystyle {\overline {X}}_{n}\,{\overset {\mathcal {D}}{\rightarrow }}\,\mu \qquad {\text{for}}\qquad n\to \infty .}

μ is a constant, which implies that convergence in distribution to μ and convergence in probability to μ are equivalent (see Convergence of random variables.) Therefore,

X ¯ n   P   μ when   n . {\displaystyle {\overline {X}}_{n}\ {\overset {P}{\rightarrow }}\ \mu \qquad {\textrm {when}}\ n\to \infty .} 2

This shows that the sample mean converges in probability to the derivative of the characteristic function at the origin, as long as the latter exists.

Proof of the strong law

We give a relatively simple proof of the strong law under the assumptions that the X i {\displaystyle X_{i}} are iid, E [ X i ] =: μ < {\displaystyle {\mathbb {E} }=:\mu <\infty } , Var ( X i ) = σ 2 < {\displaystyle \operatorname {Var} (X_{i})=\sigma ^{2}<\infty } , and E [ X i 4 ] =: τ < {\displaystyle {\mathbb {E} }=:\tau <\infty } .

Let us first note that without loss of generality we can assume that μ = 0 {\displaystyle \mu =0} by centering. In this case, the strong law says that

Pr ( lim n X ¯ n = 0 ) = 1 , {\displaystyle \Pr \!\left(\lim _{n\to \infty }{\overline {X}}_{n}=0\right)=1,} or Pr ( ω : lim n S n ( ω ) n = 0 ) = 1. {\displaystyle \Pr \left(\omega :\lim _{n\to \infty }{\frac {S_{n}(\omega )}{n}}=0\right)=1.} It is equivalent to show that Pr ( ω : lim n S n ( ω ) n 0 ) = 0 , {\displaystyle \Pr \left(\omega :\lim _{n\to \infty }{\frac {S_{n}(\omega )}{n}}\neq 0\right)=0,} Note that lim n S n ( ω ) n 0 ϵ > 0 , | S n ( ω ) n | ϵ   infinitely often , {\displaystyle \lim _{n\to \infty }{\frac {S_{n}(\omega )}{n}}\neq 0\iff \exists \epsilon >0,\left|{\frac {S_{n}(\omega )}{n}}\right|\geq \epsilon \ {\mbox{infinitely often}},} and thus to prove the strong law we need to show that for every ϵ > 0 {\displaystyle \epsilon >0} , we have Pr ( ω : | S n ( ω ) | n ϵ  infinitely often ) = 0. {\displaystyle \Pr \left(\omega :|S_{n}(\omega )|\geq n\epsilon {\mbox{ infinitely often}}\right)=0.} Define the events A n = { ω : | S n | n ϵ } {\displaystyle A_{n}=\{\omega :|S_{n}|\geq n\epsilon \}} , and if we can show that n = 1 Pr ( A n ) < , {\displaystyle \sum _{n=1}^{\infty }\Pr(A_{n})<\infty ,} then the Borel-Cantelli Lemma implies the result. So let us estimate Pr ( A n ) {\displaystyle \Pr(A_{n})} .

We compute E [ S n 4 ] = E [ ( i = 1 n X i ) 4 ] = E [ 1 i , j , k , l n X i X j X k X l ] . {\displaystyle {\mathbb {E} }={\mathbb {E} }\left={\mathbb {E} }\left.} We first claim that every term of the form X i 3 X j , X i 2 X j X k , X i X j X k X l {\displaystyle X_{i}^{3}X_{j},X_{i}^{2}X_{j}X_{k},X_{i}X_{j}X_{k}X_{l}} where all subscripts are distinct, must have zero expectation. This is because E [ X i 3 X j ] = E [ X i 3 ] E [ X j ] {\displaystyle {\mathbb {E} }={\mathbb {E} }{\mathbb {E} }} by independence, and the last term is zero --- and similarly for the other terms. Therefore the only terms in the sum with nonzero expectation are E [ X i 4 ] {\displaystyle {\mathbb {E} }} and E [ X i 2 X j 2 ] {\displaystyle {\mathbb {E} }} . Since the X i {\displaystyle X_{i}} are identically distributed, all of these are the same, and moreover E [ X i 2 X j 2 ] = ( E [ X i 2 ] ) 2 {\displaystyle {\mathbb {E} }=({\mathbb {E} })^{2}} .

There are n {\displaystyle n} terms of the form E [ X i 4 ] {\displaystyle {\mathbb {E} }} and 3 n ( n 1 ) {\displaystyle 3n(n-1)} terms of the form ( E [ X i 2 ] ) 2 {\displaystyle ({\mathbb {E} })^{2}} , and so E [ S n 4 ] = n τ + 3 n ( n 1 ) σ 4 . {\displaystyle {\mathbb {E} }=n\tau +3n(n-1)\sigma ^{4}.} Note that the right-hand side is a quadratic polynomial in n {\displaystyle n} , and as such there exists a C > 0 {\displaystyle C>0} such that E [ S n 4 ] C n 2 {\displaystyle {\mathbb {E} }\leq Cn^{2}} for n {\displaystyle n} sufficiently large. By Markov, Pr ( | S n | n ϵ ) 1 ( n ϵ ) 4 E [ S n 4 ] C ϵ 4 n 2 , {\displaystyle \Pr(|S_{n}|\geq n\epsilon )\leq {\frac {1}{(n\epsilon )^{4}}}{\mathbb {E} }\leq {\frac {C}{\epsilon ^{4}n^{2}}},} for n {\displaystyle n} sufficiently large, and therefore this series is summable. Since this holds for any ϵ > 0 {\displaystyle \epsilon >0} , we have established the Strong LLN.


Another proof was given by Etemadi.

For a proof without the added assumption of a finite fourth moment, see Section 22 of Billingsley.

Consequences

The law of large numbers provides an expectation of an unknown distribution from a realization of the sequence, but also any feature of the probability distribution. By applying Borel's law of large numbers, one could easily obtain the probability mass function. For each event in the objective probability mass function, one could approximate the probability of the event's occurrence with the proportion of times that any specified event occurs. The larger the number of repetitions, the better the approximation. As for the continuous case: C = ( a h , a + h ] {\displaystyle C=(a-h,a+h]} , for small positive h. Thus, for large n:

N n ( C ) n p = P ( X C ) = a h a + h f ( x ) d x 2 h f ( a ) {\displaystyle {\frac {N_{n}(C)}{n}}\thickapprox p=P(X\in C)=\int _{a-h}^{a+h}f(x)\,dx\thickapprox 2hf(a)}

With this method, one can cover the whole x-axis with a grid (with grid size 2h) and obtain a bar graph which is called a histogram.

Applications

One application of the LLN is an important method of approximation known as the Monte Carlo method, which uses a random sampling of numbers to approximate numerical results. The algorithm to compute an integral of f(x) on an interval is as follows:

  1. Simulate uniform random variables X1, X2, ..., Xn which can be done using a software, and use a random number table that gives U1, U2, ..., Un independent and identically distributed (i.i.d.) random variables on . Then let Xi = a+(b - a)Ui for i= 1, 2, ..., n. Then X1, X2, ..., Xn are independent and identically distributed uniform random variables on .
  2. Evaluate f(X1), f(X2), ..., f(Xn)
  3. Take the average of f(X1), f(X2), ..., f(Xn) by computing ( b a ) f ( X 1 ) + f ( X 2 ) + . . . + f ( X n ) n {\displaystyle (b-a){\tfrac {f(X_{1})+f(X_{2})+...+f(X_{n})}{n}}} and then by the Strong Law of Large Numbers, this converges to ( b a ) E ( f ( X 1 ) ) {\displaystyle (b-a)E(f(X_{1}))} = ( b a ) a b f ( x ) 1 b a d x {\displaystyle (b-a)\int _{a}^{b}f(x){\tfrac {1}{b-a}}{dx}} = a b f ( x ) d x {\displaystyle \int _{a}^{b}f(x){dx}}

We can find the integral of f ( x ) = c o s 2 ( x ) x 3 + 1 {\displaystyle f(x)=cos^{2}(x){\sqrt {x^{3}+1}}} on . Using traditional methods to compute this integral is very difficult, so the Monte Carlo method can be used here. Using the above algorithm, we get

1 2 f ( x ) d x {\displaystyle \int _{-1}^{2}f(x){dx}} = 0.905 when n=25

and

1 2 f ( x ) d x {\displaystyle \int _{-1}^{2}f(x){dx}} = 1.028 when n=250

We observe that as n increases, the numerical value also increases. When we get the actual results for the integral we get

1 2 f ( x ) d x {\displaystyle \int _{-1}^{2}f(x){dx}} = 1.000194

When the LLN was used, the approximation of the integral was closer to its true value, and thus more accurate.

Another example is the integration of f(x) = e x 1 e 1 {\displaystyle {\frac {e^{x}-1}{e-1}}} on . Using the Monte Carlo method and the LLN, we can see that as the number of samples increases, the numerical value gets closer to 0.4180233.

See also

Notes

  1. ^ Dekking, Michel (2005). A Modern Introduction to Probability and Statistics. Springer. pp. 181–190. ISBN 9781852338961.
  2. Yao, Kai; Gao, Jinwu (2016). "Law of Large Numbers for Uncertain Random Variables". IEEE Transactions on Fuzzy Systems. 24 (3): 615–621. doi:10.1109/TFUZZ.2015.2466080. ISSN 1063-6706. S2CID 2238905.
  3. ^ Sedor, Kelly. "The Law of Large Numbers and its Applications" (PDF).
  4. Kroese, Dirk P.; Brereton, Tim; Taimre, Thomas; Botev, Zdravko I. (2014). "Why the Monte Carlo method is so important today". Wiley Interdisciplinary Reviews: Computational Statistics. 6 (6): 386–392. doi:10.1002/wics.1314. S2CID 18521840.
  5. Dekking, Michel, ed. (2005). A modern introduction to probability and statistics: understanding why and how. Springer texts in statistics. London : Springer. p. 187. ISBN 978-1-85233-896-1.
  6. Dekking, Michel (2005). A Modern Introduction to Probability and Statistics. Springer. pp. 92. ISBN 9781852338961.
  7. Dekking, Michel (2005). A Modern Introduction to Probability and Statistics. Springer. pp. 63. ISBN 9781852338961.
  8. Pitman, E. J. G.; Williams, E. J. (1967). "Cauchy-Distributed Functions of Cauchy Variates". The Annals of Mathematical Statistics. 38 (3): 916–918. doi:10.1214/aoms/1177698885. ISSN 0003-4851. JSTOR 2239008.
  9. Mlodinow, L. (2008). The Drunkard's Walk. New York: Random House. p. 50.
  10. Bernoulli, Jakob (1713). "4". Ars Conjectandi: Usum & Applicationem Praecedentis Doctrinae in Civilibus, Moralibus & Oeconomicis (in Latin). Translated by Sheynin, Oscar.
  11. Poisson names the "law of large numbers" (la loi des grands nombres) in: Poisson, S. D. (1837). Probabilité des jugements en matière criminelle et en matière civile, précédées des règles générales du calcul des probabilitiés (in French). Paris, France: Bachelier. p. 7. He attempts a two-part proof of the law on pp. 139–143 and pp. 277 ff.
  12. Hacking, Ian (1983). "19th-century Cracks in the Concept of Determinism". Journal of the History of Ideas. 44 (3): 455–475. doi:10.2307/2709176. JSTOR 2709176.
  13. Tchebichef, P. (1846). "Démonstration élémentaire d'une proposition générale de la théorie des probabilités". Journal für die reine und angewandte Mathematik (in French). 1846 (33): 259–267. doi:10.1515/crll.1846.33.259. S2CID 120850863.
  14. ^ Seneta 2013.
  15. ^ Yuri Prohorov. "Law of large numbers". Encyclopedia of Mathematics. EMS Press.
  16. Bhattacharya, Rabi; Lin, Lizhen; Patrangenaru, Victor (2016). A Course in Mathematical Statistics and Large Sample Theory. Springer Texts in Statistics. New York, NY: Springer New York. doi:10.1007/978-1-4939-4032-5. ISBN 978-1-4939-4030-1.
  17. ^ "The strong law of large numbers – What's new". Terrytao.wordpress.com. 19 June 2008. Retrieved 2012-06-09.
  18. Etemadi, N. Z. (1981). "An elementary proof of the strong law of large numbers". Wahrscheinlichkeitstheorie Verw Gebiete. 55 (1): 119–122. doi:10.1007/BF01013465. S2CID 122166046.
  19. Kingman, J. F. C. (April 1978). "Uses of Exchangeability". The Annals of Probability. 6 (2). doi:10.1214/aop/1176995566. ISSN 0091-1798.
  20. Loève 1977, Chapter 1.4, p. 14
  21. Loève 1977, Chapter 17.3, p. 251
  22. ^ Yuri Prokhorov. "Strong law of large numbers". Encyclopedia of Mathematics.
  23. "What Is the Law of Large Numbers? (Definition) | Built In". builtin.com. Retrieved 2023-10-20.
  24. Ross (2009)
  25. Lehmann, Erich L.; Romano, Joseph P. (2006-03-30). Weak law converges to constant. Springer. ISBN 9780387276052.
  26. Dguvl Hun Hong; Sung Ho Lee (1998). "A Note on the Weak Law of Large Numbers for Exchangeable Random Variables" (PDF). Communications of the Korean Mathematical Society. 13 (2): 385–391. Archived from the original (PDF) on 2016-07-01. Retrieved 2014-06-28.
  27. Mukherjee, Sayan. "Law of large numbers" (PDF). Archived from the original (PDF) on 2013-03-09. Retrieved 2014-06-28.
  28. J. Geyer, Charles. "Law of large numbers" (PDF).
  29. Newey & McFadden 1994, Lemma 2.4
  30. Jennrich, Robert I. (1969). "Asymptotic Properties of Non-Linear Least Squares Estimators". The Annals of Mathematical Statistics. 40 (2): 633–643. doi:10.1214/aoms/1177697731.
  31. Wen, Liu (1991). "An Analytic Technique to Prove Borel's Strong Law of Large Numbers". The American Mathematical Monthly. 98 (2): 146–148. doi:10.2307/2323947. JSTOR 2323947.
  32. Etemadi, Nasrollah (1981). "An elementary proof of the strong law of large numbers". Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete. 55. Springer: 119–122. doi:10.1007/BF01013465. S2CID 122166046.
  33. Billingsley, Patrick (1979). Probability and Measure.
  34. ^ Reiter, Detlev (2008), Fehske, H.; Schneider, R.; Weiße, A. (eds.), "The Monte Carlo Method, an Introduction", Computational Many-Particle Physics, Lecture Notes in Physics, vol. 739, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 63–78, doi:10.1007/978-3-540-74686-7_3, ISBN 978-3-540-74685-0, retrieved 2023-12-08

References

  • Grimmett, G. R.; Stirzaker, D. R. (1992). Probability and Random Processes (2nd ed.). Oxford: Clarendon Press. ISBN 0-19-853665-8.
  • Durrett, Richard (1995). Probability: Theory and Examples (2nd ed.). Duxbury Press.
  • Martin Jacobsen (1992). Videregående Sandsynlighedsregning [Advanced Probability Theory] (in Danish) (3rd ed.). Copenhagen: HCØ-tryk. ISBN 87-91180-71-6.
  • Loève, Michel (1977). Probability theory 1 (4th ed.). Springer.
  • Newey, Whitney K.; McFadden, Daniel (1994). "36". Large sample estimation and hypothesis testing. Handbook of econometrics. Vol. IV. Elsevier Science. pp. 2111–2245.
  • Ross, Sheldon (2009). A first course in probability (8th ed.). Prentice Hall. ISBN 978-0-13-603313-4.
  • Sen, P. K; Singer, J. M. (1993). Large sample methods in statistics. Chapman & Hall.
  • Seneta, Eugene (2013). "A Tricentenary history of the Law of Large Numbers". Bernoulli. 19 (4): 1088–1121. arXiv:1309.6488. doi:10.3150/12-BEJSP12. S2CID 88520834.

External links

Categories: