Review statement of Problem 3.

The other probabilities referred to in the problem represent three particular cases of the probability just found and, therefore, can be gotten from the expression \(\frac{\Delta^l Z_{n-i-l}}{Z_n}\) for particular assumptions about \(i\) and \(l\): \[\mathrm{1)}\ i=0, \ \ \ \ \mathrm{2)}\ i=n-l,\ \ \ \ \mathrm{3)}\ i=0, l=n.\]

Letting \(i = 0\), we get the following expression for the probability of the appearance of \(l\) specific numbers: \[\frac{\Delta^l Z_{n-l}}{Z_n} = 1 - \frac{l}{1}\left(\frac{n-m}{n}\right)^k + \frac{l(l-1)}{1\cdot2} \left(\frac{n-m}{n}\right)^k \left(\frac{n-m-1}{n-1}\right)^k - \cdots \]

Letting \(i = n - l\), we find that the probability of the appearance of \(l\) specific numbers and the nonappearance of the remaining is equal to:

\[\begin{align} \frac{\Delta^l Z_0}{Z_n} &= \frac{Z_l}{Z_n} - \frac{l}{1}\frac{Z_{l-1}}{Z_n} + \frac{l(l-1)}{1 \cdot 2} \frac{Z_{l-2}}{Z_n} - \cdots \\ &= \left(\frac{n-m}{n}\right)^k \left(\frac{n-m-1}{n-1}\right)^k \cdots \left(\frac{l-m+1}{l+1}\right)^k \left\lbrace 1 - \frac{l}{1} \left(\frac{l-m}{l}\right)^k + \cdots \right\rbrace. \end{align}\]

Finally, the probability of the appearance of all \(n\) numbers is equal to:

\[\frac{\Delta^n Z_0}{Z_n} = 1 - \frac{n}{1}\left(\frac{n-m}{n}\right)^k + \frac{n(n-1)}{1\cdot2} \left(\frac{n-m}{n}\right)^k \left(\frac{n-m-1}{n-1}\right)^k - \cdots.\]

Stopping at the last formula and noting that it requires tedious calculation for large values of \(n\), we derive two approximate formulas for it.

For getting the first approximate formula, assume that all numbers \(\left(\frac{n-m-1}{n-1}\right)^k\), \(\left(\frac{n-m-2}{n-2}\right)^k, \dots\), are equal to \(\left(\frac{n-m}{n}\right)^k\) which, for brevity, we denote by the letter \(t\).

Under this assumption, the indicated formula immediately gives us: \[\frac{\Delta^n Z_0}{Z_n} \approx (1-t)^n,\] where \(t = \left(\frac{n-m}{n}\right)^k\).

For the second approximation, we note that for small values of \(i\), the ratio \(\left(\frac{n-m-i}{n-i}\right)^k : \left(\frac{n-m}{n}\right)^k \), which is equal to \(\left\lbrace 1 - \frac{im}{(n-i)(n-m)}\right\rbrace^k\), differs little from \(1 - \frac{kim}{n(n-m)}\), and the product \[\left(1 - \frac{km}{n(n-m)} \right) \left(1 - \frac{2km}{n(n-m)} \right) \cdots \left(1 - \frac{ikm}{n(n-m)} \right) \] differs little from \[1-\frac{km(1+2+\cdots+i)}{n(n-m)} = 1 - \frac{kmi(i+1)}{2n(n-m)}.\]

On this basis, we accept \(t^{i+1}\left(1 - \frac{kmi(i+1)}{2n(n-m)}\right)\) as the approximate value of each product \[\left(\frac{n-m}{n}\right)^k \left(\frac{n-m-1}{n-1}\right)^k \cdots \left(\frac{n-m-i}{n-i}\right)^k.\]

Substituting this approximate expression in place of the exact one in the formula, we get \[\frac{\Delta^n Z_0}{Z_n} \approx (1-t)^n - \frac{kmt^2(n-1)}{2(n-m)}(1-t)^{n-2}\approx (1-t)^n\left\lbrace1-\frac{kmt^2}{2}\right\rbrace,\] since we assume the numbers \(\frac{n-1}{n-m}\) and \(\frac{1}{(1-t)^2}\) are close to 1.

We apply our approximate formula to finding the number of draws under the hypothesis that the probability of the appearance of all numbers was approximately equal to a given number \(\frac{1}{C}\).

The first approximation formula gives \((1-t)^n \approx \frac{1}{C}\) from which we deduce \(n \log(1-t) \approx -nt \approx -\log C\); but \(t = \left(\frac{n-m}{n}\right)^k\) and, thus, \(\log t = k \log\left(1- \frac{m}{n}\right) \approx - \frac{km}{n}\).^{11}

Comparing the approximate equations \(-nt \approx -\log C\) and \(\log t \approx -\frac{km}{n}\), we find \[k \approx \frac{n(\log n- \log \log C)}{m} .\]

In the subsequent calculation, let \(\frac{\log C}{n} = t_0\) and \(\frac{n(\log n- \log \log C)}{m} = k_0\), so that \(t_0\) and \(k_0\) are the approximate values of the numbers \(t\) and \(k\).

The second approximate expression of the probability gives \((1-t)^n \left(1-\frac{kmt^2}{2}\right) \approx \frac{1}{C}\), from which, by performing the approximate calculations, we deduce

\[\log C \approx nt + \frac{nt^2}{2} + \frac{kmt^2}{2} \approx nt + \frac{(n+k_0 m) t_0^2}{2},\]

\[t \approx \frac{\log C}{n} \left[ 1-\frac{(n+k_0 m)t_0^2}{2 \log C}\right],\]

and then \[-\log t \approx \log n- \log \log C + \frac{(n + k_0 m) t_0^2}{2 \log C}.\]

On the other hand, we have \[ -\log t = -k \log\left(1-\frac{k}{m}\right) \approx \frac{km}{n} + \frac{k_0 m^2}{2n^2}.\]

Finally, equating one approximate expression for $\log t$ to the other, we arrive at the approximate equation \[ \frac{km}{n} + \frac{k_0 m^2}{2n^2} \approx \log n - \log \log C + \frac{(n + k_0 m) t_0^2}{2 \log C},\] from which we easily deduce

\[\begin{align} k &\approx \frac{n}{m} \left\lbrace \log n - \log \log C + \frac{k_0 m}{2 n^2}(\log C - m) + \frac{1}{2n} \log C\right\rbrace \\

&\approx \frac{1}{m} \left\lbrace (\log n - \log \log C) \left( n + \frac{1}{2} \log C - \frac{m}{2} \right) + \frac{1}{2} \log C\right\rbrace.

\end{align}\]

For example, let us assume \(n= 90,\ m=5,\ C=2\).

Then

\(\log n = 4.4998\dots\), |
\(\log C = 0.69314\dots\), |

\(\log \log C = -0.3665\dots\), |
\(n + \frac{1}{2} \log C - \frac{m}{2} = 87.84657\dots\). |

and, performing a simple calculation by the previous approximate formula, we get \[k = \frac{4.8663 \times 87.8466 + 0.346}{5} \approx 85.5.\]

According to this result, it can be verified that the probability of the appearance of all 90 numbers in 85 draws is somewhat less than one-half, and for 86 draws is already bigger than one-half.

Continue to statement of Problem 4.

[11] Here, Markov used “log” to denote natural (base \(e\)) logarithms. Later, he used the same notation for common (base 10) logarithms!

The Taylor series for \(\ln(1-t)\) is \(-t -\frac{t^2}{2} - \frac{t^3}{3} - \cdots \approx -t\), for small values of t. (The series converges for \(-1 \leq t < 1\).) Later, he included the quadratic term for a better approximation.