Expected Value and Convergence

Convergence of Discrete CDFs

Let X be an discrete RV with CDF \[F_X(x) = \sum_{k = 0}^K p_k \cdot \mathbb{1}_{[x_k, +\infty)}(x)\]

If \(\sum_{k = 0}^K p_k \cdot \mathbb{1}_{[x_k, +\infty)}(x) < +\infty\), then the expected value \(\mathbb{E}X\) is \[\mathbb{E}X = \sum_{k=0}^{K}x_kp_k\]

If \(\sum_{k = 0}^K p_k \cdot \mathbb{1}_{[x_k, +\infty)}(x) = +\infty\), then the expected value \(\mathbb{E}X\) does not exist.

The mean of \(x_kp_k\) is \[\sum_{k=0}^{+\infty}x_k p_k = x_0p_0 + x_1p_1 + x_2p_2 \ldots\]

But by our intuition, \(x_0, x_1, x_2 \ldots\) are created equal! The order in which we add them doesn’t matter. We should have

\[\sum_{k=0}^{+\infty}x_k p_k = \sum_{k=0}^{+\infty}x_{\sigma(k)} p_{\sigma(k)}\] so that our sum is permutation invariant. In order to guarantee this equation above is true, we have two concepts:

Convergence of Continuous CDFs

Let X be a continuous RV with PDF \(p_X(x)\).

  1. If \(\int_{-\infty}^{+\infty}|x|p_X(x) < +\infty\), then the expected value is \[\mathbb{E}X = \int_{-\infty}^{+\infty}xp_X(x)\].
  2. If \(\int_{-\infty}^{+\infty}|x|p_X(x) = +\infty\), then the expected value \(\mathbb{E}X\) does not exist.

Remark: The condition ” \(\int_{-\infty}^{+\infty}|x|p_X(x) < +\infty\)” invovles something more subtle… “Lebesque integrals”.

Example: Let X be continuous RV with the following PDF

\[p_X(x) = \frac{1}{\pi(1+x^2)}, x \in \mathbb{R}\]

The expected value does not exist. (to be proved in HW5)

Transformations of RVs

\[\Omega = \{ \text {students taking APMA 1655} \} (\omega \in \Omega: \text{is a student})\]

\(X(\omega) = \text{ the score } \omega \text{ gets in the final exam }\)

The professor wants to curve the score like this: \[Y(\omega) = \min\{X(\omega)^2, 100\}\]

If \(g(x) = \min\{x^2, 100\}\), we have \(Y(\omega) = g(Y(\omega))\). Suppose we are given - RV \(X: \Omega \rightarrow \mathbb{R}\) - a function \(g: \mathbb{R} \rightarrow \mathbb{R}\)

\[\Omega \rightarrow \mathbb{R} \rightarrow \mathbb{R}\] \[\omega \rightarrow X(\omega) \rightarrow g(X(\omega))\]

We know how to calculate \(\mathbb{E}X\). To compute \(\mathbb{E}[g(X)]\) we need a new definition.

Expected value of an transformed RV

Definition: Suppose \(X\) and \(g\) are given. 1. Suppose X is discrete and has CDF \(\sum_{k=0}^K p_k \cdot \mathbb{1}_{[x_k, +\infty)}(x)\). If \(\sum_{k=0}^K|g(x_k)| p_k < +\infty\) then \[\mathbb{E}[g(X)] = \sum_{k=0}^K g(x_k)\cdot p_k.\] 2. Suppose X is continuous and has PDF \(p_X(x)\). If \(\int_{-\infty}^{\infty}|g(x)|p_X(x)dx < +\infty\), then \(\mathbb{E}[g(x)] = \int_{-\infty}^{\infty}g(x)p_X(x)dx\). Otherwise, \(\mathbb{E}[g(x)]\) does not exist!

This definition is actually a theorem in graduate-level probability theory course, called “Law of the unconscious statistician”.

Expected value of any RV

For a random variable who’s CDF can be decomposed in this way… (all of them) Definition: Let \(X\) be a RV with the following CDF: \[F_X(x) = p \cdot F_Z(x) + (1 - p)F_W(x)\] where \(0 \leq p \leq 1\), \(Z\) is a discrete RV with CDF \(F_Z\) \(W\) is a continuous RV with CDF \(F_W\).j

We define \(\mathbb{E}X = p \cdot \mathbb{E}Z + (1 - p) \cdot \mathbb{E}W\), if \(\mathbb{E}Z\) and \(\mathbb{E}W\) exist.

going back to a homework problem

\(X = YZ + (1 - Y) \cdot W\) where \(Y\) ~ Bernoulli(\(\frac{1}{3}\)), \(Z\) ~ Pois(\(\lambda\)). As we all know from homework problem 4.5, we have \[\mathbb{E}Z = \lambda\] And also \[\mathbb{E}W = 1000\]

Then \(\mathbb{E}X\) is \(\frac{1}{3}\lambda + \frac{2}{3}1000\), by the expected value of any RV.

Variance

Let \(X\) be a random variable from some distribution.

The distribution generates numbers \(X_1, X_2, ... X_n\).

For example, if we use Bernoulli we will get 0 1 0 1 0 0 1 …

Well, for this sequence, we can take the average \(\bar{X_n} = \frac{X_1 + X_2 + ... + X_n}{n} = \mathbb{E}X\). But this is not exactly equal to \(\mathbb{E}X\) for \(n = 10\) or \(100\)… For \(n\) numbers, we have error as: \[e_n = |\bar{X_n} - \mathbb{E}X|\]

What does the error \(e_n\) depend on?

Simulation Study: Discrete

We can relate the error \(e_n\) to the discrete distribution as follows.

Generate \(X_1, X_2, \ldots X_n\) from Bernoulli(\(p\)).

Claim: \(e_n = |\bar{X_n} - p|\) is likely to be smaller than \[\sqrt{p \cdot (1-p) \cdot \frac{2\log(\log(n))}{n}}\]

\(V\) here is \(p(1-p)\).

Simulation Study: Continuous

We have a similar error \(e_n\) for continuous distributions.

Generate \(X_1, X_2, \ldots X_n\) from Poisson(\(\lambda\)).

Claim: \(e_n = |\bar{X_n} - \lambda|\) is likely to be smaller than \[\sqrt{\lambda \cdot \frac{2\log(\log(n))}{n}}\]

\(V\) here is \(\lambda\).

What does likely mean in this context? If we run a simulation \(\infty\) times, then the number of simulations under the curve over the total number approaches 1.

General conjecture

Assume that the expected value exists. Generate \(X_1, X_2, \ldots X_n\) from a distribution.

\(e_n = |\bar{X_n} - \mathbb{E}X|\) is likely to be smaller than \[\sqrt{V \cdot \frac{2\log(\log(n))}{n}}\]

\(V\) is the variance of \(X\).

This is the law of the iterated logarithm.

Definition of Variance

Okay, now we go to a a result. The proof isn’t related to anything above, only it’s application.

Let X be a RV, whose expected value \(\mathbb{E}X\) exists.

We define a function \(g(x) = (x - \mathbb{E}X)^2\)

We call \(\mathbb{E}[g(X)]\) as the variance of \(X\) if \(\mathbb{E}[g(X)]\) exists. That is,

\[\text{V} (X) = \mathbb{E}[(X - \mathbb{E}X)^2]\]

Variance is nothing but the expected expected squared deviation from the \(\mathbb{E}X\). That is, for every \(\mathbb{E}X\) we will have an expected squared deviation. In turn, we have a squared deviation, \(g(x)\) for each value of \(X\).

Concept Review

0 is not impossible

E is an event: \(E \in \Omega\).

“E is impossible”, \(E = \emptyset\) means that \(\mathbb{P}(E) = 0\). But \(\mathbb{P}(E) = 0\) does not mean that E is impossible!

1 is not inevitable

Similarly,

“E is inevitable”, \(E = \Omega\) means that \(\mathbb{P}(E) = 1\). But \(\mathbb{P}(E) = 1\) does not mean that E is inevitable!

constant (uniform) distributions

\(X(\omega) = c \text{ for all } \omega \in \Omega\) \(\text{Var}(X) = 0\)

If \(\text{Var}(X) = 0\), then there must $ c$ \[\mathbb{P}(\{\omega \in \Omega: X(\omega) = c\}) = 1\]

Law of Large Numbers (LLN)

A theorem that describes the result of performing an experiment a large number of times.

Let \(X \sim\) a distribution. The distribution generates random numbers \(X_1, X_2, \ldots, X_n\).

\[\bar{X}_n = \frac{X_1 + X_2 +\ldots + X_n}{n} \approx \mathbb{E}X\]

For example:

Bernoulli(\(p\)) has \(\mathbb{E}X = p\). Pois(\(\lambda\)) has \(\mathbb{E}X = \lambda\).

Okay. So what does \(\approx\) really mean?

And what other conditions do we need for this to be true?

RVs i.i.d.

Let \(\{X_i\}_{i=1}^{\infty}\) be an infinitely long sequence of RVs defined on \((\Omega, \mathbb{P})\). As we know, we say $X_1, X_2, $ are independent if \[\mathbb{P}(\{\omega \in \Omega: X_i(\omega) \in A_i \text{ for all } i = 1,2,\ldots, n \})\]

Is equal to \[\Pi_{i=1}^n \mathbb{P}(\{\omega \in \Omega: X_i(\omega) \in A_i\})\]

They are not just pairwise independent. They are independent!

Definition: Let \(\{X_i\}_{i=1}^{\infty}\) be an infinitely long sequence of RVs defined on \((\Omega, \mathbb{P})\). We say \(X_1, X_2, \ldots\) are independently and identically distributed if 1. \(X_1, X_2, \ldots\) are independent. 2. \(X_1, X_2, \ldots\) share the same CDF, i.e. $F_{X_1} = F_{X_2} = F_{X_3} $

Example: Mike Meng flips a coin. Taylor Swift flips a coin. The outcomes of these coin tosses are two random variables, are independent and share the same CDF.

back to the LLN

Heuristic: \[\bar{X}_n = \frac{X_1 + X_2 +\ldots + X_n}{n} \approx \mathbb{E}X_1\]

Theorem: Let \(\{X_i\}_{i=1}^{\infty}\) be an infinitely long sequence of RVs defined on \((\Omega, \mathbb{P})\).

Suppose that $X_1, X_2, $ are independently and identically distributed.

Suppose that \(\mathbb{E}X\) exists.

As we know, we can take the mean ${X}_n $. Then we know that we have

\[A = \{\omega \in \Omega: \lim_{n \rightarrow \infty} \bar{X}_n(\omega) = \mathbb{E}X_1\}\] This is the event that involves doing each of these RVs \(X_1, X_2, \ldots\) in an infinite sequence once (getting some outcome for each) and then taking the average of them. Then \[\mathbb{P}(A) = 1\]

This does not mean that \(A = \Omega\) (that A is inevitable).

\[\frac{X_1(\omega) + X_2(\omega) + \ldots}{n} \rightarrow \mathbb{E}X_1\] The sample average converges to the population average with probability one. This convergence is not inevitable. For example, we have an experiment where we flip a coin an infinitely many number of times, one outcome of which \(\omega'\) is all tails.

infinite coin flips and all tails

We flip a coin \(\infty\) times.

\(X_1 = 0\) \(X_2 = 0\)\(X_n = 0\) … An outcome is an infinitely long sequence of H and T \[\omega = (H, H, T, H, T, \ldots)\]

\[\Omega = \{\omega = (\omega^1, \omega^2, \ldots \omega^n)\}\]

Each \(w^i\) is either H or T.

\[A = \{ \omega \in \Omega: \lim_{n \rightarrow \infty} \frac{X_1 + X_2 +\ldots + X_n}{n} = \frac{1}{2} \}\]

We always get tails. This is possible!

\(\omega = (T, T, T, \ldots)\) \[\lim_{n \rightarrow \infty} \frac{X_1(\omega') + X_2(\omega') +\ldots + X_n(\omega')}{n} = 0 \]

Each \(w\) is an infinite number of coin flips. The sample space has an infinite number of \(\omega\), and an infinite number of those \(\omega\) have an average of \(\frac{1}{2}\). The law of large numbers says that \(\mathbb{P}(A) = 1\).

If \(\mathbb{E}X_1\) does not exist, the LLN is not necessarily true.

If \(X_1, X_2, \ldots\) are not independent, the LLN is not necessarily true. For example, if we set all of the random variables to a single random variable, the value will be 0 or 1, not \(\frac{1}{2}\).

For each \(i = 1,2, \ldots\)

\[X_i(w) = X(w) \text{ for all } \omega \in \Omega\] \[X_1 = X_2 = X_3 = \ldots = X\] \[A = \{\omega \in \Omega: \lim_{n \rightarrow \infty} \frac{X_1(\omega) + X_2(\omega) +\ldots + X_n(\omega)}{n} = \frac{1}{2}\}\]

The LLN anticipates \(\mathbb{P}(A) = 1\). However, \(\mathbb{P}(A) = \mathbb{P}(X = \frac{1}{2}) = \mathbb{P}(\emptyset) = 0\)

Monte Carlo Integration

Let \(X\) be a continuous RV with PDF \(p_X(x)\) and \(g(x)\) as a real valued function. Then we have

\[\mathbb{E}[g(X)] = \int_{-\infty}^\infty g(x) p_X(x) dx \]

\[\text{Var} [g(X)] = \int_{-\infty}^\infty (g(x) - \mathbb{E}[p_X(x)])^2 p_X(x) dx \]

For example, let \(X\) ~ Unif(0,1) and \[g(x) = \arccos(\frac{\cos\frac{\pi}{2}x}{1 + 2\cos(\frac{\pi}{2}x)})\]

Then we have \(\mathbb{E}[g(X)] = \int_{-\infty}^\infty g(x) \mathbf{1}_{(0,1)}(x)dx = \int_{0}^{1} g(x) dx\) = \(\frac{5\pi}{2 }\). This can be found by doing \(X\) many times and doing \(g(x)\) of it.

Generalized LLN

Suppose \(X_1, X_2, \ldots\) be an infinite sequence of random variables. They are independently and identically distributed, and \(g(x)\) is a continuous function. Furthermore that \(\mathbb{E}[g(X_1)]\) exists.

Define

\[A = \{\omega \in \Omega: \lim_{n \rightarrow \infty} \frac{g(X_1(\omega)) + g(X_2(\omega)) +\ldots + g(X_n(\omega))}{n} = \mathbb{E}[g(X_1)]\}\]

Then \(\mathbb{P}(A) = 1\). More intuitively: the number of infinite sequences in \(\Omega\), that have an expected value of \(\mathbb{E}[g(X_1)]\), approaches all of the infinite series.

The error \(|e_n(w)|\) (the difference between \(\bar X_n\) and \(E[g(X)]\)) is likely to be smaller than

\[\sqrt{\text{Var}[g(X_1)] \frac{2\ln(\ln(n))}{n}}\]

This is known as the law of the iterated logarithm.

Review of the LLN

Let $X_1, X_2, $ be i.i.d. and their expected values exist. When \(n\) is large,

\[\bar X_n(\omega) = \frac{X_1(\omega) + \ldots + X_n(\omega)}{n} \approx \mathbb{E}X_1 = \mathbb{E}X_2 = \ldots\]

Unless you are extremely unlucky.

Law of the Iterated Logarithm

The law of the iterated logarithm is a result about a logarithm function involving \(Var(X_1)\), stating that it is a good bound for the error.

Let \(X_1, X_2, X_3, \ldots\) be RVs defined on \((\Omega, \mathbb{P})\). Suppose

Intuitively, when \(n\) is large, \[|e_n(\omega)| \leq \sqrt{Var(X_1) \frac{2\log(\log(n))}{n}}\] is true for almost all \(\omega\) in \(\Omega\), so that \(\mathbb{P}\) of these \(\omega\) approaches 1.

Intuitively, when \(n\) is large, \[|e_n(\omega)| > \sqrt{Var(X_1) \frac{2\log(\log(n))}{n}}\] is true for almost no \(\omega\) in \(\Omega\), so that \(\mathbb{P}\) of these \(\omega\) approach 0.

Law of the Iterated Logarithm (rigorous)

\[\lim_{m \rightarrow \infty}[\sup_{n \geq m}(\frac{|e_n(\omega)|}{\sqrt{Var(X_1) \frac{2\log(\log(n))}{n}}})] = 1\]

is true for almost all \(w\) in \(\Omega\), so that \(\mathbb{P}\) of these \(\omega\) approaches 1.

An example: quantifying the error of an Monte Carlo integration

$U_1, U_2, U_3, $ Unif(0,1)

\[g(x) = \arccos(\frac{\cos\frac{\pi}{2}x}{1 + 2\cos(\frac{\pi}{2}x)})\]

\[\frac{g(U_1(\omega)) + \ldots + g(U_n(\omega))}{n} \approx \mathbb{E}[g(U)] = \int_{0}^{1} \arccos(\frac{\cos\frac{\pi}{2}x}{1 + 2\cos(\frac{\pi}{2}x)}) dx \]

\(X_i(\omega) = g(U_i(\omega))\) for all \(i\) and \(\omega\)

So that we have

\[\bar{X_n}(\omega) = \frac{X_1(\omega) + \ldots + X_n(\omega)}{n} \approx \mathbb{E}X_1 = \mathbb{E}[g(U)]\]

\[e_n(\omega) = \bar{X_n}(\omega) - \mathbb{E}X_1\]

Let’s say that we want the error to be no higher than \(10^{-5}\). Then we would have

\[|e_n(\omega)| \leq \sqrt{Var(X_1) \frac{2\log(\log(n))}{n}}\leq 10^{-5}\]

\[\sqrt{0.007556 \frac{2\log(\log(n))}{n}}\leq 10^{-5}\]

For error of \(10^{-3}\) this would be n = 1.5 million!

Random Walks

We have \[\bar{X_n}(\omega) = \frac{X_1(\omega) + \ldots + X_n(\omega)}{n}\]

But we could also look at \[\bar{S_n}(\omega) = X_1(\omega) + \ldots + X_n(\omega)\]

This is a random walk!

Let \(\{X_i\}_{i=1}^{\infty}\) be a sequence of RV defined on \((\Omega, \mathbb{P})\) and with \(X_i\) i.i.d. For each positive \(n\), we define \(S_n(\omega) = X_1(\omega) + \ldots + X_n(\omega)\). The sequence \(\{X_i\}_{i=1}^{\infty}\) is the random walk.

Example 1: \[\mathbb{P}(X_1 = -1) = \mathbb{P}(X_1 = 1) = \frac{1}{2}\]

Then \(\{X_i\}_{i=1}^{\infty}\) is called the 1-dim simple RW.

Central Limit Theorem: prelude

Review:

Let \(\{X_i\}_{i=1}^{\infty}\) be a sequence of i.i.d. RV on \((\Omega, \mathbb{P})\). We have a sum \[ S_n(\omega) = X_1(\omega) + X_2(\omega) + \ldots + X_n(\omega)\]

So \(S_n(\omega)\) is a random walk. If we go one step further, we can take \[\bar{X_k}(\omega) = \frac{S_n(\omega)}{n} \approx \mathbb{E}X_1\] The average is only equal to the expected value \(\mathbb{E}X\) when \(n\) is large.

Additionally, we have the law of the iterated logarithm.

Now we will say the central limit theorem.

\[e_n = |\bar{X_n} - \mathbb{E}X|\]

\[\sqrt{n} \cdot e_n(\omega) \sim N(0, \text{Var } X_1)\]

when \(n\) is large.

Potential applications of CLT:

Okay, now we will define it in full.

Central Limit Theorem

Let \(\{X_i\}_{i=1}^{\infty}\) be a sequence of i.i.d. RVs on \((\Omega, \mathbb{P})\). Suppose \(\mathbb{E}X_1\) and \(\text{Var }X\) exist. We define a sequence of RVs \(\{G_n(\omega)\}_{n=1}^{\infty}\) by

\[G_n(\omega) = \sqrt{n} \cdot e_n(\omega) = \sqrt{n} (\bar{X_n} - \mathbb{E}X)\]

Heuristic version: when \(n\) is large, \(G_n\) looks like a RV following \(N(0, \text{Var}X_1)\).

Rigorous version: The CDF of \(G_n\) converges to the CDF of \(N(0, \text{Var}X_1)\) as \(n \rightarrow \infty\): \[\lim_{n \rightarrow \infty} \mathbb{P}(\{\omega \in \Omega: G_n(\omega) \leq x\}) = \int_{-\infty}^{x} \frac{1}{\sqrt{2\pi \cdot \text{Var} X_1}} \cdot \exp(-\frac{t^2}{2VarX_1}) dt\]

Under the conditions of the CLT, we have the following. Heuristic version:

\(\frac{G_n}{\sqrt{\text{Var }{X_1}}}\) looks like a RV following \(N(0,1)\) when \(n\) is large.

Rigorous version: the CDF of \(\frac{G_n(\omega)}{\text{Var }X_1}\) converges to the CDF of \(N(0, 1)\).

This is as follows:

\[ \lim_{n \rightarrow \infty} \mathbb{P}(\{\omega \in \Omega: \frac{G_n(\omega)}{\sqrt{\text{Var } X_1}} \leq x \}) = \int_{-\infty}^x \frac{1}{\sqrt{2\pi}} \cdot \exp(-\frac{t^2}{2})dt \]

Review of CLT

Let \(\{X_i\}_{i=1}^{\infty}\) be a sequence of independent and identically distributed RVs,a nd the expected values and variances exist. We have

\[\sqrt{n}(\bar{X_n} - \mathbb{E}X_1) \sim N(0,\text{Var }X)\] \[\sqrt{n}\frac{(\bar{X_n} - \mathbb{E}X_1)}{\sqrt{\text{Var }X_1}} \sim N(0,1)\]

When \(n\) is large.

LLN Approximation Error, from the CLT Viewpoint

We have two quantities, the average \(\bar{X_n}(\omega)\) and the expected value \(\mathbb{E}X_1\). We have

\[e_n(\omega) = \bar{X_n}(\omega) - \mathbb{E}X_1\]

We also have

\[e_n(\omega) = \frac{\sqrt{\text{Var }X_1}}{\sqrt{n}} Z_n(\omega)\]

For any \(\delta \geq 0\):

\[\mathbb{P}(\{\omega \in \Omega: | e_n(\omega) | \leq \delta \frac{\sqrt{\text{Var }X_1}}{\sqrt{n}}\})\] \[ = \mathbb{P}(\{\omega \in \Omega: | Z_n(\omega) | \leq \delta \})\] \[ = F_{Z_n}(\delta) - F_{Z_n}(-\delta) + \mathbb{P}(Z_n = -\delta)\] \[ = \mathbb{P}( -\delta \leq Z_n \leq -\delta) + \mathbb{P}(Z_n = -\delta)\]

We have \(\Phi(x)\) to be the CDF of \(N(0,1)\): \[\lim_{x \rightarrow \infty} F_{Z_n}(x) = \int_{-\infty}^{x}\frac{1}{\sqrt{2\pi}}\exp(-\frac{t^2}{2}) dt = \Phi(x)\]

We have the following simplification: \[= F_{Z_n}(\delta) - F_{Z_n}(-\delta) + \mathbb{P}(Z_n = -\delta)\] \[\approx \Phi(\delta) - \Phi(-\delta)\] \[\approx 2 \Phi(\delta) - 1\]

If we have \(\Phi(\delta^{*}) = 0.975\), then we have \(e_n(\omega) = 0.95\). \[\delta^{*} = 1.96\] This requires lots of computation.

\[\mathbb{P}(\{\omega \in \Omega: | e_n(\omega) | \leq \delta^{*} \frac{\sqrt{\text{Var }X_1}}{\sqrt{n}}\}) \approx 2\Phi(\delta) - 1 = 0.95\]

With 95% confidence, the approximation error \(| e_n(\omega) |\) is smaller than or equal to \[| e_n(\omega) | \leq \delta^{*} \frac{\sqrt{\text{Var }X_1}}{\sqrt{n}}\]

Law of the iterated logarithm: we had \[\mathbb{P}(\{\omega \in \Omega: |e_n(\omega)| \leq \sqrt{2\log\log n} \cdot \frac{\sqrt{\text{ Var } X_1}}{\sqrt{n}} \}) \approx 1\]

With 100% confidence we have \[|e_n(\omega)| \leq \sqrt{2\log\log n} \cdot \frac{\sqrt{\text{ Var } X_1}}{\sqrt{n}}\] When \(n\) is large!

When n > 1000, \(\sqrt{2\log\log n} > 1.96 \approx \delta^{*}\)

We can get a perfect upper bound. However, this upper bound is too large. However, with the central limit theorem, with the cost of 5% confidence, we can get a lower upper bound.

Proof of Central Limit Theorem

Let \(G_1, G_2, \ldots\) be a sequence of RVs. We say \(G_n\) converges weakly to a continuous RV \(G\) if

\[\lim_{n \rightarrow \infty} F_{G_n}(x) = F_{G}(x)\]

for all real numbers \(x\).

This is briefly denoted as \(G_n \rightarrow_{w} G\).

Let \(X_1, X_2, \ldots\) be i.i.d. RVs whose expected values and variances exist. We can define a error-like function called \(G_n\) that is within a normal distribution:

\[\lim_{n \rightarrow \infty} F_{G_n}(x) = \text{the CDF of N}(0, Var X_1) = F_{G}(x)\]

Moment-generating Functions

Used to describe a probability function.

Let \(X\) be a RV.

We define a moment-generating function as

\[M_X(t) = \mathbb{E}[e^{t\cdot X}]\]

provided that the expected value exists for all \(t \in \mathbb{R}\).

\[e^{tX} = 1 + tX + \frac{t^2 X^2}{2!} + \frac{t^3 X^3}{3!} \ldots\]

We have the expected value of a sum involving a random variable to be the sum of terms involving an expected value of a random variable:

\[\mathbb{E}[\sum_{j=1}^{J}c_jg_j(X)] = \sum_{j=1}^{J}c_j \mathbb{E}[g_j(X)]\]

Applying this to our M_X^{t} we have \[M_X^{t} = \mathbb{E}[e^{t\cdot X}] = 1 + t \mathbb{E}X + \frac{t^2 \mathbb{E}[X^2]}{2!} + \frac{t^3 \mathbb{E}[X^3]}{3!} \ldots \]

So then our derivative is \[ \frac{d}{dt} M_X(t) = \mathbb{E}X + t\mathbb{E}[X^2] + \frac{t^2}{2!}\mathbb{E}[X^3] + \ldots \]

At t = 0, our first moment is \[ \frac{d}{dt} M_X(t) = \mathbb{E}X \]

Our second moment is \[ \frac{d^2}{dt^2} M_X(t) = \mathbb{E}[X^2] \]

Our \(k^{th}\) moment is \[ \frac{d^k}{dt^k} M_X(t) = \mathbb{E}[X^k] \]

Theorem

Let $G, G_1, G_2, $ be a sequence of RVs.

Recall that \[M_{G_n}(t) = \mathbb{E}[e^{t G_n}]\] \[\mathbb{E}[e^{t G_n}] = \int_{-\infty}^{\infty} e^{tx}p_X(x) dx\]

If \(\lim_{n \rightarrow \infty} M_{G_n}(t) = M_G(t)\):

Then \[G_n \rightarrow_{w} G\] If we have a convergence of the moment functions, then we have a convergence of the random variables!

Theorem

Let \(X_1, X_2, \ldots, X_n\) be RVs. \[S_n(\omega) = X_1(\omega) + X_2(\omega) + \ldots + X_n(\omega) = \sum_{i=1}^{n}X_i(w)\]

If \(X_1, X_2, \ldots, X_n\) are independent:

\[M_{S_n(t)} = \prod_{i=1}^{n} M_{X_i}(t)\]

Furthermore, if \(X_1, X_2, \ldots, X_n\) are independently distributed:

\[M_{S_n}(t) = (M_{X_1}(t))^n\]

Two more lemmas

Lemma 1: Let \(\{c_n\}_{n=1}^{\infty}\) be a sequence of real numbers satisfying \(\lim_{n \rightarrow \infty} c_n\) = 0. If \(\lim_{n \rightarrow \infty}c_n = \lambda\), then \[ \lim_{n \rightarrow \infty} (1 + c_n) = ?\]

Lemma 2:

If \(G \sim N(0, \sigma^2)\). Then \[M_G(t) = e^{\frac{t^2\sigma^2}{2}}\]

Proof of CLT

\[G_n = \sqrt{n}[(\frac{1}{n} \sum_{i=1}^n X_1) - \mathbb{E}X_1]\]

We move the average to the outside of the sum:

\[G_n = \frac{1}{\sqrt{n}}[\sum_{i=1}^n (X_i - \mathbb{E}X_1) ]\]

Given this fact, we can replace \(G_n\) in our moment:

\[M_{G_n(t)} = \mathbb{E}[e^{tG_n}]\] \[M_{G_n(t)} = \mathbb{E}[e^{\frac{t}{\sqrt{n}} \sum_{i=1}^n (X_i - \mathbb{E}X_1)}]\]

Because our random variables are i.i.d:

\[M_{G_n(t)} = \mathbb{E}[e^{\sum_{i=1}^n \frac{t}{\sqrt{n}} (X_i - \mathbb{E}X_1)}]\]

\[M_{G_n(t)} = (\mathbb{E}[e^{\frac{t}{\sqrt{n}} (X_i - \mathbb{E}X_1)}])^n\]

\[M_{G_n(t)} = ((\mathbb{E}[1 + \frac{t}{\sqrt{n}}(X_1 - \mathbb{E}X) + \frac{t^2 (X_1 - \mathbb{E}X)^2}{2n} + \ldots + \sum)^n\]

\[M_{G_n(t)} = (1 + \frac{t}{\sqrt{n}}(0) + \frac{t^2 (X_1 - \mathbb{E}X)^2}{2n} + \ldots + \sum)^n\]

\[\approx (1 + \frac{t^2}{2n} \text{Var }X_1)^n\] \[\approx (1 + c_n)^n\]

\[M_{G_n(t)} \approx (1 + c_n)^n \rightarrow e^{\frac{t^2}{2} \text{Var }X_1}\]

This is the Moment generating function for \(N(0, \text{Var }(X_1))\).

Thus, our function converges:

\[\lim_{n \rightarrow \infty} F_{G_n}(x) = \text{the CDF of N}(0, Var X_1) = F_{G}(x)\]