We know the central limit theory, that is for $\{X_i\}$ i.i.d with finite variance
$$
\sqrt{M} (\bar{X}_M - \mathbb{E}[X] ) \Rightarrow \mathcal{N}(0, Var(X))
$$
However, this gives only the estimation in gap $\sigma$, but what happens for the distribution in large distance from the mean?
This requires the tool of large deviation estimation, This is a basic tool in mathematics and is used every in probability and statistics. For probabilistes, large deviation gives the probability of the events atypical and sometimes the correlation function estimation. For statisticiens, this gives the interval of confiance non-asymptotic. In a word, this is a necessary tool.
Inequality of Chernoff
We start from the typical Markov inequality$$
\mathbb{P}\left[S_n / n - \mathbb{E}[X] > x\right] = \mathbb{P}[e^{\theta( {S_n}/{n} - \mathbb{E}[X])} > e^{\theta x}]
$$
So we get
$$
\mathbb{P}\left[S_n / n - \mathbb{E}[X] > x\right] \leq e^{- n I(x)}
$$
where we define
$$\begin{eqnarray}
\phi(\theta) &=& \log \mathbb{E}[e^{\theta X}] \\
I(x) &=& \sup_{\theta \in R} \theta x - \phi(\theta)
\end{eqnarray}$$
However, from this simple inequality, a lot technique is developed.
Inequality of Hoeffding
We can give a better estimation for the case $X$ is bounded in $[a,b]$, that is
$$
\mathbb{P}\left[S_n / n - \mathbb{E}[X] > x\right] \leq \exp(- \frac{2 x^2}{n(b-a)^2})
$$
Idea is to develop the function $\phi$ in 0 and then give a good estimation.
Some generalized version is also possible. For exemple, we can consider not only one function, but a family of function - in another word, a dictionary. The more general theorem depends largely on the theory of covering, or approximation.
\mathbb{P}\left[S_n / n - \mathbb{E}[X] > x\right] \leq \exp(- \frac{2 x^2}{n(b-a)^2})
$$
Idea is to develop the function $\phi$ in 0 and then give a good estimation.
Some generalized version is also possible. For exemple, we can consider not only one function, but a family of function - in another word, a dictionary. The more general theorem depends largely on the theory of covering, or approximation.
Inequality of for Gaussian
However, a big obstacle of the inequality of Hoeffding is that the condition of bound. How to treat the unbounded function, for example, Gaussian, a large class of function?
An idea is a the inequality of type entropy. The entropy of a function under the mesure $\mu$ is to define
$$ Ent_{\mu}(f) = \mathbb{E}(f \log{(f)}) - \mathbb{E}(f) \log{(\mathbb{E}(f))}$$
In some case, the mesure $\mu$ verifies the inequality of log-Soblev such that
$$ Ent_{\mu}(f^2) \leq C_{\mu} \mathbb{E}_{\mu}(|\nabla f|^2) $$
In this case we can deduce an inequality
$$ \mathbb{P}(|f(Y) - \mathbb{E}(f(Y))| > \epsilon) \leq 2 \exp{(-\frac{\epsilon^2}{C_{\mu}|f|^2_{Lip}})}$$
It is the type of inequality of large deviation. A natural question is whether the mesure verifies the inequality of log-Soblev. It requires analysis and the answer for Gaussian is positive. However, a general case is just one branch of research.
Theorem of Gramer
A more general principle is the theorem of Gramer, which gives not only the upper bound but also the lower bound of a distribution.$$\begin{eqnarray}
-\inf_{x \in \Gamma^{O}}I(x)
\leq \liminf_{n \rightarrow \infty} \frac{1}{n} \log \mathbb{P}\left[S_n / n \in \Gamma \right] \\
\leq \limsup_{n \rightarrow \infty} \frac{1}{n} \log \mathbb{P}\left[S_n / n \in \Gamma \right]
\leq -\inf_{x \in \Gamma^{F}}I(x)
\end{eqnarray}$$
In the course, we analyse in detail some properties of the function $\phi(\theta)$ and $I(x)$, like their convexity, zeros and monotony. The upper side is also like the Chernoff upper bound, while the left lower bound use the change of probability to prove it.
没有评论:
发表评论