A Selection of Problems from A.A. Markov’s Calculus of Probabilities: Calculating Probabilities of Repeated Independent Events, Part 1

Author(s):

Alan Levine (Franklin and Marshall College)

In the remainder of Chapter IV of his Calculus of Probabilities, Markov proceeded to analyze the binomial distribution (as we would now call it), first covered in Chapter II, which can be viewed as the sum of independent Bernoulli random variables. He made use of some rather sophisticated arguments involving continued fractions, Stirling's approximations and hypergeometric series. This is definitely more advanced than the previous material.

In conclusion of the chapter, we return to an important question about the repetition of independent experiments that we studied in the second chapter.

Denoting the number of experiments by the letter \(n\) and assuming that, for each of them, the probability of event \(E\) is equal to \(p\), we found that the probability that event \(E\) occurs exactly \(m\) times in these \(n\) experiments is expressed by the product \[ \frac{1\cdot 2\cdot 3\cdot \cdots \cdot n}{1\cdot 2\cdot 3\cdot \cdots \cdot m \cdot 1\cdot 2\cdot 3\cdot \cdots \cdot (n-m)} p^m q^{n-m},\] where \(q = 1-p\).

Therefore, the probability that event \(E\) occurs more than \(l\) times in the \(n\) experiments considered is represented by the sum
\[ \frac{1\cdot 2\cdot 3\cdot \cdots \cdot n p^{l+1}q^{n-l-1}}{1\cdot 2\cdot 3\cdot \cdots \cdot (l+1) \cdot 1\cdot 2\cdot 3\cdot \cdots \cdot (n-l-1)} + \frac{1\cdot 2\cdot 3\cdot \cdots \cdot n p^{l+2}q^{n-l-2}}{1\cdot 2\cdot 3\cdot \cdots \cdot (l+2) \cdot 1\cdot 2\cdot 3\cdot \cdots \cdot (n-l-2)} + \cdots,\] which reduces to the product of the expression \[P =\frac{1\cdot 2\cdot 3\cdot \cdots \cdot n}{1\cdot 2\cdot 3\cdot \cdots \cdot (l+1) \cdot 1\cdot 2\cdot 3\cdot \cdots \cdot (n-l-1)} p^{l+1}q^{n-l-1}\] and the sum \[S = 1+ \frac{n-l-1}{l+2} \frac{p}{q} + \frac{(n-l-1)(n-l-2)}{(l+2)(l+3)} \left( \frac{p}{q} \right)^2 + \cdots\]

For the approximate calculation of \(P\) for large values of \(n\), \(l+1\) and \(n-l-1\), we can use Stirling's Formula,²⁵ which provides a series of inequalities, of which we show here only two of the simplest: \[P < P_1 = \sqrt{\frac{n}{2\pi (l+1)(n-l-1)}}\left(\frac{np}{l+1}\right)^{l+1} \left(\frac{nq}{n-l-1}\right)^{n-l-1}\] and \[\frac{P}{P_1} > H = e^{\frac{1}{12n} - \frac{1}{12(l+1)} - \frac{1}{12(n-l-1)}}.\]

Continue to the second part of Markov’s analysis of the binomial distribution.

Skip to Suggestions for the Classroom and Conclusion.

[25] See [Robbins 1955] for further justification of these inequalities.

Convergence

Tags:

History of Mathematics

Probability Theory

A Selection of Problems from A.A. Markov’s Calculus of Probabilities: Calculating Probabilities of Repeated Independent Events, Part 2

Author(s):

Alan Levine (Franklin and Marshall College)

Turning to the calculation of the sum \(S\), we now show that we can successfully take advantage of an expansion as a continued fraction²⁶ which follows as a particular case of Gauss’ Formula:

\[\frac{F(\alpha, \beta+1, \gamma+1, x)}{F(\alpha, \beta, \gamma, x)} = \cfrac{1}{1-\cfrac{ax}{1-\cfrac{bx}{1-\cfrac{cx}{1-\cfrac{dx}{1-\ddots}}}}}\] where \(F(\alpha, \beta, \gamma, x)\) and \(F(\alpha, \beta + 1, \gamma + 1, x)\) denote the “hypergeometric” series:²⁷

\[ 1 + \frac{\alpha\cdot \beta}{1\cdot \gamma} x + \frac{\alpha(\alpha+1)\beta(\beta+1)}{1\cdot 2 \gamma (\gamma + 1)} x^2 + \cdots \] and \[ 1 + \frac{\alpha(\beta+1)}{1(\gamma+1)} x + \frac{\alpha(\alpha+1)(\beta+1)(\beta+2)}{1\cdot 2 (\gamma + 1)(\gamma + 2)} x^2 + \cdots \] and the coefficients \(a, b, c,d, \dots\) are determined by the equations
\[\begin{align} a & = \frac{\alpha (\gamma-\beta)}{\gamma(\gamma + 1)},&\ \ b & = \frac{(\beta+1)(\gamma+ 1 - \alpha)}{(\gamma+1)(\gamma+2)}, \\
c&= \frac{(\alpha+1)(\gamma+ 1 - \beta)}{(\gamma+2)(\gamma+3)}, &\ \ d &= \frac{(\beta+2)(\gamma+ 2 - \alpha)}{(\gamma+3)(\gamma+4)}, \dots\\
\end{align}\]

Referring to the result of Gauss’ Formula, we note that it follows from the following simple connections between different hypergeometric series:
\[F(\alpha, \beta+1, \gamma+1, x) - F(\alpha, \beta, \gamma, x) = ax F(\alpha+1, \beta+1, \gamma+2, x),\]
\[F(\alpha+1, \beta+1, \gamma+2, x) - F(\alpha, \beta+1, \gamma+1, x) = bx F(\alpha+1, \beta+2, \gamma+3, x),\]
\[F(\alpha+1, \beta+2, \gamma+3, x) - F(\alpha+1, \beta+1, \gamma+2, x) = cx F(\alpha+2, \beta+2, \gamma+4, x), \dots\]

To apply Gauss’ Formula to the expansion of \(S\) as a continued fraction, we should let \(\alpha = -n + l + 1\), \(\beta = 0\), \(\gamma = l+1\), \(x = -\frac{p}{q}\), which gives us the equation \[S = \cfrac{1}{1-\cfrac{c_1}{1+\cfrac{d_1}{1-\cfrac{c_2}{1+\cfrac{d_2}{1-\ddots}}}}},\] where, in general, \[c_k = \frac{(n-k-l)(l+k)}{(l+2k-1)(l+2k)}\frac{p}{q},\ \ \ \ d_k = \frac{k(n+k)}{(l+2k)(l+2k+1)}\frac{p}{q}.\]

We have here not an infinite, but a finite continued fraction, the last term of which will be \(\frac{d_{n-l-1}}{1}\) since \(c_{n-l} = 0\).

It is also not difficult to see that each of the numbers \(c_k\) is less than 1 only if \(\frac{n-l-1}{l+2}\frac{p}{q} <1\), as we will assume in the following argument.

Hence, denoting for brevity the continued fraction \[\cfrac{c_k}{1+\cfrac{d_k}{1-\cfrac{c_{k+1}}{1+\cfrac{d_{k+1}}{1-\ddots}}}}\] by the symbol \(\omega_k\), we have \(0<\omega_k< c_k\) and then we can establish a series of inequalities:
\[ S = \frac{1}{1-\omega_1} < \frac{1}{1-c_1}, \ \ S > \cfrac{1}{1-\cfrac{c_1}{1+\cfrac{d_1}{1-c_2}}},\]
\[S < \cfrac{1}{1-\cfrac{c_1}{1+\cfrac{d_1}{1-\cfrac{c_2}{1+\cfrac{d_2}{1-c_3}}}}}, \cdots\]

Continue to the third part of Markov’s analysis of the binomial distribution.

Skip to Suggestions for the Classroom and Conclusion.

[26] Markov's 1884 doctoral dissertation under the guidance of his advisor, P. L. Chebyshev was entitled, “On Some Applications of Algebraic Continued Fractions.” The concept of continued fractions dates back at least to John Wallis (1616–1703) in the 17th century.

More generally, a continued fraction is an expression of the form \[a_0 + \cfrac{b_1}{a_1 + \cfrac{b_2}{a_2 + \cfrac{b_3}{a_3 + \ddots}}}.\] The number of terms may be finite or infinite. If \(b_i = 1\), for all \(i\), the fraction is said to be “simple.” If all the terms of a finite continued fraction are rational, then the fraction represents a rational number. Infinite continued fractions represent irrational numbers.

[27] The hypergeometric series used here is generally denoted \[\,_2F_1(a,b;c,z) = \sum_{n=0}^\infty \frac{(a)_n(b)_n}{(c)_n} \frac{z^n}{n!},\] where \((k)_n = k (k+1)(k+2)\cdots(k+n-1) = \frac{(k+n-1)!}{(k-1)!}\) is the “ascending factorial” or “Pochhammer symbol.” As we have noted before, Markov did not use the symbol “\(!\).” These series have applications in many branches of mathematics, including differential equations and elliptic integrals. There are many identities involving hypergeometric series, some of which are used here. Although Euler and others used these series, the first comprehensive study was done by Gauss in 1813.

The subscripts 2 and 1 on \(F\) mean there are two ascending factorial expressions in the numerator and one in the denominator. This can be generalized to \(\,_p F_q\).

Many functions can be expressed as hypergeometric series; for example, \[\,_2 F_1 (1,1;2,-z) = \frac{\ln(1+z)}{z}.\]

Convergence

Tags:

History of Mathematics

Probability Theory

A Selection of Problems from A.A. Markov’s Calculus of Probabilities: Calculating Probabilities of Repeated Independent Events, Part 3

Author(s):

Alan Levine (Franklin and Marshall College)

It remains to compare the previous inequalities with those which \(P\) satisfies and which we have shown above, and we will have the possibility of forming a series of approximate values of the probability that event \(E\) occurs more than \(l\) times in the \(n\) experiments considered, where for each of these approximate values, we will know whether it is greater than the probability or, conversely, less than it.²⁸

Based on these inequalities, interchanging \(p\) with \(q\) and substituting \(l\) by \(n-l'\), we find a series of approximate values of the probability that event \(E\) occurs less than \(l'\) times in the \(n\) considered experiments, where for each of the given approximate values of this new probability, we will also know whether it is larger than the probability or, conversely, smaller than it.

And with approximate values of the probability that event \(E\) occurs more than \(l\) times and the probability that event \(E\) occurs less than \(l'\) times, where \(l > l'\), it is not difficult to obtain an approximate value of the probability that event \(E\) occurs not more than \(l\) times and not less than \(l'\) times, since the sum of these three probabilities must be 1.

For example, let \(p=\frac{3}{5}\), \(q=\frac{2}{5}\), \(n=6520\) and we will seek the probability that the ratio of the number of occurrences of event \(E\) to the number of experiments will differ from \(\frac{3}{5}\) by less than \(\frac{1}{50}\).

In other words, we will seek the probability that event \(E\) occurs not more than 4042 times, and the complement to the event not more than 2738 times.²⁹

According only to the established remarks, calculation of the desired probability reduces to calculating the probability that event \(E\) occurs more than 4042 times and the probability that the complementary event occurs more than 2738 times.

Returning to the probability that event \(E\) occurs more than 4042 times, in the above formulas and inequalities, we must let \(p=\frac{3}{5}\), \(q= \frac{2}{5}\), \(n=6520\), \(l = 4042\).

Then: \[P_1 = \sqrt{ \frac{3260}{\pi\cdot 4043 \cdot 2477}} \left(\frac{3912}{4043}\right)^{4043} \left(\frac{2608}{2477}\right)^{2477},\] \[H = e^{\frac{1}{12\cdot 6520} - \frac{1}{12\cdot 4043} - \frac{1}{12\cdot 2477}},\] and by means of logarithmic tables, we find...

The rest of this page in the original text is filled with highly-detailed arithmetic involving logarithms with 10-digit mantissas. We shall omit this and jump to Markov's conclusions.

\[\log H > -0.00003,\] \[ 0.00004094 < P < 0.00004095.\]

On the other hand, we have:
\[c_1 = \frac{2477}{4044} \cdot \frac{3}{2} = \frac{7431}{8088},\ \ d_1 = \frac{6521}{4044\cdot 4045} \cdot \frac{3}{2} = \frac{19563}{32715960},\]
\[c_2 = \frac{2476 \cdot 4044}{4045 \cdot 4046}\cdot \frac{3}{2} = \frac{7509708}{8183035},\ \ d_2 = \frac{6522 \cdot 3}{4046 \cdot4047} = \frac{3261}{2729027},\]
\[c_3 = \frac{2475 \cdot 4045}{4047 \cdot 4048} \cdot \frac{3}{2} = \left(1 - \frac{2}{4047} \right) \frac{7425}{8096}.\]
and, by performing simple calculations we get, consecutively,
\[c_3< 0.9167,\ \ \frac{d_2}{1-\omega_3} < \frac{3261}{0.0833\times 2729027} < 0.01435,\]
\[0.918 > c_2 > \omega_2 >\frac{c_2}{1.01435} > 0.9047,\]
\[0.0074 > \frac{d_1}{0.082} > \frac{d_1}{1-\omega_2} > \frac{d_1}{0.0953} > 0.00626,\]
\[0.912 < \frac{c_2}{1.0074} < \omega_1 < \frac{c_1}{1.00626} < 0.9131,\]
\[11.36 < \frac{1}{0.088} < S < \frac{1}{0.0869} < 11.51;\]
Consequently,
\[SP < \frac{0.4095}{869} < 0.0004713,\] but
\[SP > \frac{0.4094}{880} > 0.000465.\]

Proceeding to the probability that the event complementary to \(E\) occurs more than 2738 times, we must let \(p=\frac{3}{5}\), \(q= \frac{2}{5}\), \(n=6520\), \(l = 2738\), ...

This page is similar to the previous pages in featuring detailed arithmetic using logarithms.

\[\log H < -0.00003,\] \[0.00004280 < P < 0.00004281,\] \[\dots\] \[SP < 0.0005002,\] \[SP > 0.000491.\]

So, the probability that event \(E\) occurs more than 4042 times in the 6520 considered experiments is included between 0.0004713 and 0.000465, and the probability that in these experiments, event \(E\) occurs less than 3782 times is included between 0.0005002 and 0.000491.

And, therefore, the probability that event \(E\) occurs not less than 3782 times and not more than 4042 times lies between \(1 - 0.000972 = 0.999028\) and \(1 - 0.000956 = 0.999044\).

Continue to Suggestions for the Classroom and Conclusion.

[28] In essence, Markov was computing the “convergents” of the continued fraction, where by the \(k\)^th convergent \(c_k\) of an irrational number \(r\), we mean the rational number obtained by truncating the continued fraction to the first \(k\) terms. It can be shown that \[c_0 < c_2< c_4< \cdots < r< \cdots< c_5< c_3<c_1,\] so the convergents alternate between being less than \(r\) and greater than \(r\). Both sides converge to \(r\). Furthermore, if we express \(c_k = \frac{p_k}{q_k}\) as a fraction in lowest terms, there is no rational number with denominator less than \(q_k\) that is closer to \(r\).

[29] \(4042 = 6520\cdot (\frac{3}{5} + \frac{1}{50})\); while \(2738 = 6520- 6520\cdot (\frac{3}{5}-\frac{1}{50})\).

Convergence

Tags:

History of Mathematics

Probability Theory