2017年9月13日星期三

Rappel de probabilité (4) : convergence en loi et TCL

On continue de réviser la base de probabilité et dans cette partie, on veut attaquer le problème de convergence en loi. Peut-être il y déjà un poste sur la convergence dans un espace métrique, mais le TCL et la fonction caractéristique a quand même des intérêts.

Convergence en loi dans $\mathbb{R}$

La définition de convergence en loi dans la situation $\mathbb{R}$ est assez simple et elle est définie par
$$ F_n(x) \rightarrow F(x) $$
pour tous les points de continuité. Un peu d'analyse nous dit que cette convergence est uniforme. Il y a plusieurs façons de caractériser la convergence. Par exemple, on dit $\mu_n \Rightarrow \mu$ si pour toute la fonction continuée et bornée, on a
$$\mu_n(f) \rightarrow \mu(f)$$

Il y a d'autre critère comme l'ensemble ouvert et fermé. 
$$\forall O \text{ouvert}, \liminf \mathbb{P}(X_n \in O) \geq \mathbb{P}(X \in O)$$
Une méthode de mémoriser l'inégalité est une suite de Dirac masse qui converge vers un point dans l’adhérence. La version fermée et bord est aussi facile à décrire.

On remarque dans la démonstration de $\mathbb{R}$, la représentation est assez directe i.e si $X_n \Rightarrow X$, on peut les tous mettre dans un espace en commun tel que $X_n \rightarrow X$ presque sûrement.

D'autre cas spécifique, comme le théorème de Scheffé, qui nous dit si la variable aléatoire a une densité $f_n$ et $f_n \rightarrow f$ ponctuellement, on a aussi la convergence car l'intégration de densité est toujours 1.

Tension

On liste la notation de tension dehors car elle a un rôle très important et peut être généralisé dans d'autre espace. Gros-moto, une suite est tendue si et seulement $\forall \epsilon > 0,$ il existe un compact $K_{\epsilon}$ tel que
$$\sup_n \mathbb{P}(X_n \notin K_{\epsilon}) < \epsilon$$.

L'intérêt de cette propriété est que une suite de mesure est pré-compact si et seulement elle est tendue. Donc, montrer la tension est suivant une partie important de convergence en loi. Une stratégie standard est l'argument "tension + convergence de loi marginal".

Fonction caractéristique 

On utilise aussi la fonction caractéristique d'analyser la convergence faible de variable aléatoire. La raison profonde est l'analyse de Fourier car la fonction caractéristique est la transformée de Fourier d'une mesure. Dans le livre de Durret, on voit beaucoup d'application de la fonction caractéristique. En fait, une formule utile est
$$ \frac{1}{2}(\mu(a) + \mu(b)) + (\mu((a,b)) = \frac{1}{2\pi} \lim_{T \rightarrow \infty} \int_{-T}^{T} \frac{e^{-ita}-e^{-itb}}{it}\phi(t) dt$$

Dans le cas où la fonction caractéristique est intégrable et donc la mesure a une densité, on peut retrouver la densité par la transformée inversée.
$$ f(x) = \frac{1}{2\pi} \int e^{-itx} \phi(t) dt $$

Combiner la propriété de tension et la propriété de fonction caractéristique, si la fonction caractéristique d'une suite de mesure converge, il a une convergence qui s'appelle la convergence vague. i.e la fonction de répartition n'est pas vraiment une fonction de répartition car l'information à limite est perdu. Cependant, si cette fonction a une continuité à 0, la suite a tension donc elle converge vers une mesure.

TCL, Poisson et loi stable

Finalement, on parle un peu du théorème de TCL et loi stable. La démonstration de TCL est classique est directe par la fonction caractéristique, mais on devrait savoir que c'est juste une situation très idéale. En fait, la convergence en loi est un peu robust, dans le sens que ce théorème est correct sous la condition plus faible. Pour le TCL c'est le théorème de Lendeberg-Feller.

La convergence vers une loi de Poisson est souvent appelée "le convergence de petite nombre" car il approche la probabilité d'événement rare. On peut aussi mesurer la vitesse de convergence en distance total - qui est une bonne distance de convergence en loi pour l'état discret.  

En concernant la loi stable, c'est une situation que variance n'est pas fini. Donc on fait une normalisation différente. Les variables $X$ étudiées dans ce problème a une vitesse de décroissance comme $x^{- \alpha}, 0 < \alpha < 2$. Sa limite $Y$ a une propriété que $\forall n, \exists a_n, b_n$ tel que
$$\frac{\sum_{k=1}^n Y_k - b_n}{a_n} \overset{d}{=} Y$$. Pour la paramètre $\alpha$, la normalisation est $n^{\frac{1}{\alpha}}$.

Dans d'autre espace polonais, certaine définition est aussi bien définie mais certaine ne marche plus, surtout ceux qui utilisent la fonction caractéristique, qui demande l'existence de transformée de Fourier.

 

2017年9月1日星期五

Rappel de probabilité (3) : le théorème de grand nombre

La théorème de grand nombre est une théorie très importante dans probabilité, mais sa démonstration peut être assez technique sous différentes conditions. Pour la suite, on raconte quelques histoires sur le théorème de grand nombre.

La théorie plus classique suppose que $\{X_i\}$ sont i.i.d et sa variance est finie. Sous cette condition, on a inégalité de Markov
$$
 \mathbb{P}[\frac{S_n}{n} > \epsilon] <  \frac{Var(X)}{n \epsilon^2}
$$
qui suffit d’entraîner la loi faible. Concernant la loi forte, on choisit une sous-suite et montre la convergence p.s grâce au lemme de Borel-Cantalli. Puis on contrôle les erreurs entre eux. Cette technique est la base de beaucoup de démonstration.

Pour aller plus loin, une direction est supprimer la condition de variance. Dans ce cas, il faut utiliser la technique de troncature comme
$$
Y_n = X_n \mathbb{I} _{X_n < n}
$$
Cette technique a des propriétés incroyable mais pas évidant :
(1) Avec probabilité 1, il y a que nombre fini de $X_n \neq Y_n$
(2) $\sum_n [Var(Y_n / n)] < \infty$

La première propriété est très utile, il nous dit que l'étude de convergence se réduit comme le comportement de $Y_n$. La deuxième a beaucoup d'application. Soit on suit la même chemin que la méthode classique : montre une convergence de sous-suite et contrôle les erreurs entre eux. Soit on utilise la méthode de Kolmogorov.

La méthode de Kolmogorov est en fait, utiliser l'idée de martingale. En utilisant la convergence de martingale $L^2$, on montre p.s $\sum_n \frac{Y_n}{n}$ converge. Puis, un lemme de Kronecker - un lemme pure d'analyse maths, qui nous dit que dans cette situation,  $\frac{\sum_{k=1}^n Y_k}{n}$ converges.

La situation sans $L^1$ est aussi possible d'étudier mais cela dépend de démonstration. La loi faible peut se généraliser dans le cas de variable aléatoire corrélée ou $L^1$ faible. Mais quelques fois, on essaie aussi d'autre normalisation.

2017年7月15日星期六

A small note on mixing time of Markov chain

This is a small note after  the note by Nathanael Berestycki and the object is just for reviewing some interesting story about this model and preparing for the coming summer school.  I decide to write it on my blogger since this permit me to write down the idea as quickly as possible  without paying too much attention on the detail and technical part.

Basic notation of mixing time

At first, we have to talk about the motivation of the research of this subject. In abstract measure theory, we talk about the topology of the Radon measure on a metric space and its distance. The distance which induces the weak star convergence is called Levy distance. This theory is very profond, while the situation in discrete state situation is very easy since the topology coincides and we use the total variation distance  
$$\| \mu - \nu \| = \sup_{A \subset S} \mu(A) - \nu(A)$$
to represent the distance between two measures, which is very simple and elegant. On the other hand, the study of this subject  arise because of the emerging of many computer algorithms : we have to know if they converge and moreover, the quantitative behavior of the convergence. Especially, the cutoff phenomenon tells us sometimes we have to do enough step of Markov chain to make sure that it converge.

There is some other equivalent definition of the total variation distance like 
$$\begin{eqnarray*}\| \mu - \nu \| &=& \sup_{A \subset S} \mu(A) - \nu(A) \\ &=& \frac{1}{2}\sum_{x\in S} \mu(x) - \nu(x) \end{eqnarray*}$$

Concerning the distance between a Markov chain, if it has a invariant measure  $\pi$, then we use 
$$d(t) = \sup_{x \in S}\|P^t(x, \cdot) -  \pi(\cdot)  \|$$
to describe the distance from the invariant measure. Sometimes it  interacts with another definition 
$$ \rho(t) = \sup_{x,y \in S}\|P^t(x, \cdot) -  P^t(y, \cdot) \| $$
since the two are equivalent 
$$ d(t) \leq \rho(t) \leq 2 d(t)$$.
The mixing time is defined as 
$$t_{mix}(\epsilon) = \inf \{t | d(t) \leq \epsilon \}$$.
and we use usually $t_{mix} = t_{mix}(1/e)$.

The convergence to the invariant measure is after Perron-Frobenius theorem for irreducible, aperiodic, finite state Markov chain and sometimes we also use the Lyaponov function to treat the infinite state case. However, these criteria isn't so sharp and just give a qualitative description. 

Before finishing this section, we recall the phenomenon of cutoff. Generally speaking, that is the $d(t)$ falls drastically near the point of $t_{mix}$. This phenomenon is a little hard to understand since until now, we didn't find a universal method to analyse this phenomenon but just study the model case by case. Moreover, sometimes it happens but sometimes, it can be avoided and we don't see it in the simulation.

Coupling technique

To estimate the speed of convergence, a first technique is called coupling and it may be the most useful and powerful method which apply in all the case. A coupling of measure $\mu, \nu$ is a couple $(X,Y)$ where $X$ has the law of $\mu$ and $Y$ has the law of $\nu$. We remark that the marginal law can never recover the joint distribution, so this condition doesn't fix the law $(X,Y)$.

A third definition of total variation distance is 
$$\| \mu - \nu \| = \inf_{couple}\mathbb{P}(X \ne Y)$$.
This formula gives us a upper  bound of the $d(t)$ and the upper bound of the $t_{mix}$ since 
$$\begin{eqnarray*} d(t) \leq \rho(t) &=& \sup_{x,y \in S}\|P^t(x, \cdot) -  P^t(y, \cdot) \| \\&\leq& \sup \mathbb{P}(X_t \ne Y_t) \end{eqnarray*}$$
and a useful technique is to demande that when two Markov chain meet, they move together - this induces that 
$$\boxed{d(t) \leq \sup_x \mathbb{P}_x(\tau_{X,Y} > t)}$$

One example is the lazy random walk on the circle $Z / n$. The Markov chain has $1/2$ probability to stay on the same place and $1/4$ to move forward or backward. Then we start two chain $(X,Y)$ and let a first coin to decide which one to move and second one to decide the direction. Once they meet together, the move together. In fact, the meet of two chain is reduced to the ruin of gambler. So using the first moment 
$$d(t) \leq \sup_x \mathbb{P}_x(\tau_{X,Y} > t) \leq \frac{\mathbb{E}(\tau_{X,Y})}{t} = \frac{n^2}{t}$$
We induce that 
$$t_{mix} \leq e n^2$$. 

At the end of this part, we give a small lemma about the log-sub-additivity of the $\rho$ which is 
$$\rho(s+t) \leq \rho(s) \times \rho(t)$$
The proof is easy by the coupling technique.

Strong stationary time technique

Strong stationary time is a very special technique : we have a stopping time $\tau$, once we arrive this point, the law becomes naturally invariant law. Thus 
$$d(t) \leq \mathbb{P}(\tau \geq t)$$.

This idea sounds crazy but the model fonction like this exists. One famous example is the random Top-to-random shuffle. In this model, the cards under the card at the bottom is uniformly random since there is no order under this one a prior. Therefore, once the card at the bottom comes to the top and once go back to the deck, the card becomes totally random. The mixing time is easy to calculate and is the same as the collection of coupon $t_{mix} \sim n\log n$.

However, this technique just applies in some specific case and requires a good observation.

Spectral technique : viewpoint  from operator

The spectral method is especially powerful for the case of reversible Markov chain, which relates the speed of convergence to the gap of eigenvalue. Therefore, the gap of eigenvalue $\gamma$, the convergence of total variation distance, the ration of expansion have been related naturally.

We know that for a irreducible aperiodic finite Markov chain, it has naturally a series of eigenvalue $1 = \lambda_1 > \lambda_2 \geq \lambda_3 \cdots \lambda_n \geq -1$. We can extract a family of eigenvector orthogomal naturally by diagonalisation, however, we would like the orthogomal base in inner product 
$$\langle f, g\rangle_{\pi} = \sum_{x \in S} f(x)g(x)\pi(x)$$  
instead of the orthogomal base in usual $L^2$ inner product. To realize it, we find the orthogomal base $\phi_i$ of the operator $A$ where
$$A(x,y) = \sqrt{\frac{\pi(x)}{\pi(y)}}P(x,y) \Leftrightarrow A = D^{1/2}PD^{-1/2}$$
who is also symmetric and share the same eigenvalue with $P$ that is 
$$A\phi_i = \lambda_i \phi_i$$.
Then we define $f_i = D^{-1/2}\phi_i$, we have 
$$\langle f_i, f_j\rangle_{\pi} = \langle \phi_i, \phi_j \rangle = \delta_{ij}$$
$$Pf_i = D^{-1/2}(D^{1/2}PD^{-1/2})\phi_i = D^{-1/2}A\phi_i = \lambda_i D^{-1/2}\phi = \lambda f_i$$
So $f_i$ is the orthomal base wanted. The orthogomal decomposition tells us 
$$A = \sum_{i=1}^n \lambda_i \phi_i \phi_i^T $$
when we  plug the $f_i$ in, we obtain
$$\boxed{\frac{P(x,y)}{\pi(y)} = \sum_{i=1}^n f_i(x) f_i(y) \lambda_i}$$

One may wonder why we would like find a expression like this. In fact, it has interest since 
$$\begin{eqnarray*}d(t) &=& \sup_{x \in S}\|P^t(x, \cdot) - \pi(\cdot)\| \\ &=& \frac{1}{2}\sum_{y \in S}\pi(y) \left|\frac{P(x,y)}{\pi(y) }- 1 \right| \\&=& \frac{1}{2}\|\sum_{i=2}^n f_i(x) f_i \lambda_i \|_{L^1(\pi)}  \\ &\leq& \frac{1}{2} \|\sum_{i=2}^n f_i(x) f_i \lambda_i \|_{L^2(\pi)} \\ &=& \sqrt{\sum_{i=2}^n f_i^2(x) \lambda_i^2} \end{eqnarray*}$$

We continue the analysis and obtain the  inequality that 
$$\boxed{(t_{rel} - 1)\log(\frac{1}{2\epsilon}) \leq t_{mix}(\epsilon) \leq \log(\frac{1}{2\epsilon \sqrt{\pi_{min}}})t_{rel}}$$
where the relax time is defined as
$$t_{rel} = \frac{1}{\gamma}$$

Spectral technique : viewpoint from geometry

The result obtained by the eigenvalue of operator is so powerful since it gives both the upper and lower bound. However, to calculate the gap of eigenvalue is not so easy for general case, so we propose many other method to estimate the gap $\gamma$. Here we list two  methods : by the optimal constant of Poincare's inequality and Cheeger's inequality.

If we define the Dirichlet energy under the measure $\pi$ as
$$\begin{eqnarray*}E(f,f) &=& \sum_{e}Q(e) \nabla f(e) \nabla f(e)\\ Q(x,y)  &=& \pi(x)P(x,y)\end{eqnarray*}$$,
then  the classical discrete Poincare inequality gives
$$\boxed{Var(f) \leq \frac{1}{\gamma} E(f,f)}$$.
The constant is optimal  so if we have another equality of constant  $C$, we have necessarily $t_{rel} \leq C$ so we get the upper bound of mixing time.

Another method comes from the famous Cheeger's inequality, the ratio of expansion is
$$\Phi(A) = \frac{Q(A, A^C)}{\pi(A)}$$
and the ratio of expansion of the graph is
$$\Phi_* = \inf_{\pi(A) \leq 1/2} \Phi(A)$$.
The famous Cheeger's inequality tells that
$$\frac{\Phi_*^2}{2} \leq \gamma  \leq 2\Phi_*$$

We remark that one good lower bound for the mixing time is that
$$t_{mix} \geq \frac{1}{4\Phi_*}$$
means that if the graph is not so connected, it will take long time to mix the chain.


Comparaison method and continuous time Markov chain

All the result of mixing time can also be converted to the continuous time, where the jumping time is a Poisson process but the state is still finite. We follow the same routine and get that 
$$d(t) \leq \frac{1}{2}d_2(t) = \frac{1}{2}\sqrt{\sum_{j=2}^n e^{-2t\mu_j}} , \mu_j = 1-\lambda_j$$

The comparaison theorem says that for two Markov chains $X,Y$ if $E_X(f,f) \leq AE_Y(f,f)$, then we have  $\mu_j^X \leq A \mu_j^Y$. This helps us to give some analysis for a Markov chain given the information of one chain already well understood.


Coupling from the past

Wilson proposes some interesting algorithm called coupling from the past which gives an exact  law of  invariant measure in some specific case, for example the Ising model. The simulation requires that the state should have a function to make it ordered. Unlike the MCMC, we will come from the past. The coupling technique says that generally, the invariant measure is the case that a coupling meet. Now we apply check if 
$$G(-t,0)(-1) = G(-t,0)(1)$$
where $-1,1$ represent respectively the minimal and maximal element. If it is the case, we take the value and if not, we add a segment of simulation even before it until it meet. The monotone property says it looks like we simulate from $-\infty$ and all the trajectories meet.

Finally, we remark although this program terminates almost surely, its time is a random variable so it takes risk to take longtime before ending.


Cover time

The cover time is another concept similar to mixing time which says the time to visit all the state at least once. The Matthews inequality gives the control 
$$t_{cov} \leq (1 + \frac{1}{2} + \cdots \frac{1}{n}) t_{hit}$$

One surprising result is the relation between the cover time and DGFF that 
$$t_{cov} \sim |E| (\mathbb{E}(\max_{x \in S} h_x))^2$$.

Others

We conclude that the mixing time has to be estimated carefully. The coupling method has a lot of flexibility but requires good observation and technique while the spectral method looks more like analysis on heat equation. We can also reformulate the subject in the language of electronic network so that we use the analysis of structure to estimate the mixing time. 

2017年7月12日星期三

嫌疑犯X的献身

      今天在网上看了改编自东野圭吾小说的电影,好像中日韩都改编了,我看的是日版的感觉还不错吧。这里也就随手写点感想。

      或许是因为自身专业的缘故,对男主非常有代入感而且基本也能猜到他的做法。这个世界真正热爱数学并且也掌握的人不少,但最后做出划时代成就的也就那么寥寥数人,大多人其实也只是整个进程的参与者。所以自觉壮志难酬、岁月蹉跎的数学家以及最后面对现实变成数学老师的其实也是蛮大的一个群体。做数学的人理性单纯,但其实有时候情感又说不出的强烈,脑子热起来比普通人可以更狂热。但专业特质让他们人际沟通能力或者主客世界观多多少少都有那么一点点问题。

      这个问题在哪里呢?就是从数学的角度,理论框架和模型是可以自己构建的,如果不考虑应用,甚至可以忽略掉框架和模型的合理性。这点和物理学家不同:他们或许有时候缺少逻辑的严格性,却不乏敏锐的观察,以及最终目的还是为了解释现象理论不过是辅助罢了。

      理解了这些再去解读石神和汤川就更加简单了。石神就是那个脑子发热了,情商不高的数学家。说着做了那么多,其实人家理解吗?接受吗?愿意吗?什么都不闻不问也不去试着了解。设计的阴谋某种意义上也是自己构建的一个局,真把所有人都当齿轮了。最后石神的痛苦我觉得既有感情上一厢情愿付出不得接受的无奈,也是因为觉得队友蠢吧?

      说队友蠢是有点过的,此处无意过多贬低女主。人生都有自己选择,难道接受了他人充满不正义的爱意再加上自己的负罪感,真的能过得很舒坦么?

      汤川的抉择是无奈的。很多人说他们惺惺相惜,而最后是为了一教高下才说出真相的。我倒觉得惺惺相惜固然有,但和内海的谈话已经表明了他其实自己也明白真相如此残酷,也在纠结是否就这样仍由这个局存在下去呢?但物理本身就是求事实之真,并非逻辑体系严谨自洽就承认是对的。我想因为这样他才选择去揭破这个残忍的局吧。

      最后的最后,我觉得本片对于数学系学生思想和心理教育是很有启发式的一部电影。多多少少我们不能只活在自己的世界里。都说爱情是数学的毒药,其实也不至于吧,但如果能为了爱情付出全部甚至自己的未来,为什么就不能选择两个人一起好好活下去?比如石神可以坦诚一切,选择无论如何陪母女两人一起走完剩下的路啊。

2017年6月20日星期二

Séminaire 20/06/2017 : Journée pour Yves Meyer

Il y a longtemps que je n'ai pas écrit mon blog. Aujourd'hui la séminaire à ENS Cachan est la journée pour célébrer lauréat de médaille d'Abel Yves Meyer, le fondateur de la théorie ondelette.  J'aimerais dire que j'ai déjà connu très bien l'histoire sur l'ondelette quand j'étais à l'université et c'était à la fois le premier choix de ma thèse. En conséquence, retrouver ces connaissances est comme un bon souvenir pour moi. Ici, j'ai rappelle quelques exposées intéressantes.

L'ondelette est une collection de base de $L^2(\mathbb{R})$, engendrée par translation et le changement d'échelle, orthogonal et quelques fois ils ont aussi des propriétés de décroissance, support compact etc. Généralement, ils ont la forme
$$
\psi_{j,n}(t) = \frac{1}{\sqrt{2^j}} \psi (\frac{t - 2^j n}{2^j})
$$
Dans dimension 2 la structure est un peu plus compliquée. Quand on fait le codage, c'est comme dans la transformation de Fourier, mais en fait, on garde que les coefficients plus grands que un seuil qui jette les autre termes  moins importants. Cette opération  nous permet de compresser les informations et c'est la base de JPEG. Dans l'implémentation, on utilise l'algorithme de pyramide. Malheureusement, quand Mallat raconte l'histoire, il a dit l'homme n'a plus de l'algorithme JPEG2000 car il pense que la qualité de JPEG est assez bon.

La méthode d'ondelette code l'information du signal couche par couche, bloque par bloque.

Mais dans le traitement d'image, il a d'autre technique. Un s'appelle BV décomposition. La formule est
$$
Photo = Cartoon + Texture
$$
C'est la décomposition de semi-martingale.  Dans une image, il y toujours une bonne partie comme une fonction de BV et une autre mauvaise partie comme bruit. On sépare les deux et utilise que un parmi eux.

L'aventure d'ondelette n'est pas finie. Il y a des applications dans la détection d'onde gravitationnelle, l'apprentissage approfondi etc. La racine de méthode réelle viens de la méthode complexe, mais on veut un peu plus que la méthode complexe  pour traiter le cas pas régulier. En plus, on espère de retrouver le secret dans l'algorithme de Le Cun.

J'ai  demandé aussi la question sur la simulation de géométrie imaginaire. La réponse est : la localisation est possible mais il y toujours des problème au bord. La généralisation de l'énergie de Dirichlet est faisable mais on a besoin de le refaire a la main.

2017年5月31日星期三

Séminaire 30/05/2017 : Courbure de Ricci sur graphe

Cette séminaire est très intéressante car elle réponds ma question sur comment définir une courbure sur un graphe, même si pour l'instant, la définition ne suffit pas de retrouver toutes les propriétés géométriques. Comme je travaille maintenant  sur la carte aléatoire et la gravité quantique, j'aimerais vraiment avancer encore une étape d'attaquer la partie géométrique. Pour la suite, je vais taper plusieurs propositions dans cette séminaire.(D'après Hervé Pajot)

Rappelle de géométrie Riemannienne 

Soit $M$ une variété muni de métrique $g$, on peut définir la distance de courbe $$L(r) = \int_0^1 \sqrt{g(\frac{\partial r}{\partial t}, \frac{\partial  r}{\partial t})}dt$$
et la volume $$ Vol(A) = \int_A \sqrt{det(g)} dx$$

En plus, on peut aussi définir la courbure de Ricci, qui reflet beaucoup de propriétés de cette variété. Par exemple, on sait que une variété avec la courbure de Ricci positive a une croissance de volume comme dans l'espace $\mathbb{R}^d$. Et surtout, borné en bas nous permet d'utiliser l'inégalité de Poincaré. Donc, on aimerait d'avoir une généralisation dans espace métrique.

Espace métrique

Différent que la variété Riemannienne, dans un espace métrique, la distance de courbe n'est pas bien défini à priori. C'est pour ça, on travaille dans un espace métrique, mesuré, de distance $(X, d, \mu)$. Une manière de construction est la courbure d’alexandra, qui est invariant sous la convergence de distance de Gromov-Haussdorff. D'ailleurs, la variété avec la courbure de Ricci positive  est pré-compacte dans l'espace $(K, d_{GH})$. 

Espace Wasserstein

Dans  la recherche de transport optimal,  on utilise souvent la distance de Wasserstein et l'espace de Wasserstein $(P(X), W_2)$. En fait, si l'espace $X$ est de distance, ainsi que l'espace de Wasserstein. Donc il existe géodésique entre deux distribution. Plus précisément, les élément dans  $(P(X), W_2)$ sont tous distribution de probabilité et une géodésique $\mu_t$ est la courbe plus courte de relier $\mu_0$ et $\mu_1$. Comme ça on sait comment trouver le transport optimal, c'est juste  trouver cette géodésique.

En utilisant cette idée, les mathématiciens définissent aussi une généralisation de courbure de Ricci qui s'appelle N-Ricci. Soit une une suite d'espace de courbure  positive, alors sa limite dans le sens de Gromov-Haussdorff est aussi de courbure positive. En plus, sur l'espace comme ça, on peut appliquer l'inégalité de Poincaré.

Généralisation sur graphe

Même si l'on a plusieurs manière de définir la courbure de Ricci sur un graphe, mais dans mon avis, ils ne retrouvent pas toutes les propriétés. Peut-être c'est aussi parce que l'objet étudié par probabiliste est plus concret et la description d'analyse ne suffit pas de tout dessiner.

2017年5月11日星期四

Séminaire 11/05/2017 : Probabilité de demain

This is the second edition of the seminar of this doctor probability seminar. Here I just cite some high light.

2D Ising model by Dmitry Chelkak : this is a topic which has been studies longtime, but there are still many questions open and the most  tremendous progress is found in recent year. We may  know the scaling limit of the interface of hexagon is $SLE_6$, but how about the others ? In fact, the scaling limit of the configuration is related to a similar random process - CLE conformal loop ensemble, another varied version of SLE. Moreover, a complete system of discrete complex analysis and discrete holomorphic tools are developed. There are a lot of open questions like Ising model on random maps.

Recurrence relation and dimer model by Paul Melotti : We can find a direct correspondent relation between recurrence series and the partition function of dimer model, which helps us express it explicitly, I ask the question if we could also find the recurrence for given partition function ? The answer is negative  since the recurrence is always polynomial and this is not the case for any partition function - the integrable  problem.

Flips of the triangulation on the sphere by Thomas Budzinski : The uniform triangulation is now a very popular topics. However, how we construct them ? The natural way is Monte-Carlo. We start from a configuration and we try  to flip the configuration like MCMC metropolis method. Using the $n^4$ growth and the bottleneck property, Thomas gives a inferior bound for the mixing time as $n^{\frac{5}{4}}$. However, the upper bound is missing (but he says numerical result is like this.) And we want if we could do better ?

Expander by Simon Coste : OK, this is the topic of my PSC at Polytechnique. This talk still gives me some new idea. From the point of random graph, the distribution of random value has demi-circle law and this can be generalized to some more general case - graph oriented, We ask if random map could be a expander ? It seems not but we need to add the weight ... Anyway, the spectral analysis also is an important  technique in random model.

A nonlinear SPDE by Perla EI Kettani : WOW, I find something that I look forward longtime. Once, I ask if Gaussian free field has some application in analysis and this is one example. If we consider the coefficient of each element of orthogonal base is not only a gaussian but also a Brownian motion, we  generally use the stochastic analysis frame to treat PDE, Therefore, the GFF is one type of noise as one of my friend comments on my simulation. So, probability and analysis interacts and I believe that there will be more applications.

Deformation of random field by Julie Fournier : I didn't understand much but it seems like the deformation of random field.

Bismut-Elworthy-Li formula by Henri Elad Altman : Henri fires ! This talk is about the strong Feller property. Just like the transport equation can only keep the regularity but the diffusion equation can improve the regularity, we ask the same question for the semi-group. This formula applies for a general Ito process given that the drift isn't so degenerated. This tool is essential  for the study of some kinds of equation like equation of KPZ.

Cost functional for large random trees by Marion Sciauveau : We would like to generalize  the cost functional from a discrete random tree to a continuous tree coded by Brownian excursion. Before we do the convergence and in fact, we can also embed the discrete tree into the continuous tree. The cost functional helps study the DC complexity.

Hypercube percolation by Remco van der Hofstad : A exhaustive study about the percolation on the hypercube, It is hard to imagine that the work is done without simulation. The critical point is given by
$$
\mathbb{E}_{p_c}[C_0] = 2^{n/3}
$$
and the window is about $2^{-n/3}$, which means in this period we see a drastic transition of phase.

I think a seminar like this gives us a quick understand of different direction of probability, since the subject is always very various in this domain. On the other hand, to make new friends during the seminar is also interesting. However, to see those who grow up together from prepa, master and become collaborator is an envy.



2017年5月3日星期三

Séminaire 03/05/2017 : Courbe de mesure

Résumé d'exposé d'après Hugo Lavenant sur le problème de transport optimal.

Problème de transport optimal
Sur un ensemble $\Omega$ on peut définir l'ensemble de mesure de probabilité $\mathcal{M}_1(\Omega)$. Un problème de transport optimal est chercher une manière de transport tel qu'il commence par une probabilité $\rho_0$ et termine par une autre $\rho_1$ ce coût est minimal.

Mathématiquement, l'ensemble de probabilité de transition est
$$
\Pi(\rho_0, \rho_1) = \{\pi \in \mathcal{M}_1(\Omega \times \Omega)| \forall A \in \mathcal{B}(\Omega) , \pi(A, \Omega) = \rho_0(A),  \pi(\Omega,A) = \rho_1(A)          \}
$$
et le coût est donné par la distance de Wasserstein
$$
W^2_2 (\rho_0, \rho_1) = \min_{\pi \in \Pi(\rho_0, \rho_1) } \{ \int_{\Omega \times \Omega} |x - y|^2 d\pi(x,y) \}
$$.

Bien sûr, on peut généraliser la distance ici par d'autre norme ou utiliser une distance double.


Courbe de mesure
Un théorème de Radmender nous dit dans l'espace $\mathbb{R}^d$, une fonction lipschitzienne est presque comme une fonction dérivée et on peut appliquer la formule Newton-Leibniz. En fait, c'est un résultat direct d'une fonction absolument continuée. Si on définit une flot
$$
\rho : [0, 1] \rightarrow \mathcal{M}_1(\Omega)
$$
qui est aussi lipschitzienne dans le sens de distance de Wasserstein. Alors,
$$
\frac{W_2 (\rho_{t+h}, \rho_{t})}{h}
$$
existes mais $W_2 (\rho_{t+h}, \rho_{t}) \leq \int_{t}^{t+h} |\partial \rho_t| dt$.

L’intérêt de courbe de mesure est de construire une mesure sur l'espace $\Gamma = C([0, 1], \Omega)$ donc c'est l'ensemble. Alors une mesure sur l'espace comme ça est un peu compliquée mais on rappelle qu'il est espace polonais. Donc, pour une flot, il existe une mesure $\mu \in \mathcal{M}_1(\Gamma)$ tel que
$$
 \rho_t(A) = \mu [\gamma \in \Gamma, \gamma(t) \in A]
$$
puis l'intégration
$$
\int_0^1  |\partial \rho_t|^2 dt = \int_{\Gamma} \left(\int_0^1  |\partial \gamma(t)|^2 dt\right)d\mu(\gamma)
$$
 Un dernier remarque : Cette façon relie beaucoup la analyse et la théorie de probabilité et surtout a un esprit de l'intégration de chemin. Imagine est-ce que l'on a chance de l'utiliser dans la géométrie aléatoire ? On va voir.


2017年4月29日星期六

Rappel de probabilité (2) : convergence en loi - théorie générale sur l'espace métrique

La théorie de convergence sur l'espace de probabilité est peut-être plus compliquée que  nous avons pensé. Evidemment, on connait la définition de convergence p.s, de convergence en probabilité et de convergence en loi depuis on apprend la probabilité élémentaire - ou peut-être depuis on apprend l'analyse réelle. Mais ils ont des versions avancée. Par exemple, on peut poser des questions suivante:

-Est-ce que on peut parler de convergence en loi sur un espace abstrait ? Ou au moins, la convergence de processus aléatoire.
-Est-ce que la convergence  en loi a totalement rien à voir avec la convergence p.s ?
-Comment généraliser le nuage de Poisson ? 

Personnellement, c'est l'année dernière quand je lisait des articles sur la géométrie aléatoire, j'arrive de trouver que le contexte de ces terminologies est plus large. Heureusement, les géants ont déjà construit une base solide et assez générale pour nous. - Je crois les probabilistes comme Lévy; Meyer et Neveu comprennent les travaux aujourd'hui, même si le modèle est plus varié et sophistiqué.


Convergence en loi sur $\mathbb{R}^d$

La définition de convergence en loi sur $R^d$ est assez connue : 

Définition : (Convergence en loi $\mathbb{R}^d$)
On dit $\mu_n \Rightarrow \mu$ si $\forall f \in \mathcal{C}_c({\mathbb{R}^d})$, $\mu_n(f) \rightarrow \mu(f)$.

Il existe plusieurs caractérisations de cette définition. 
Caractérisation 1 : Par le théorème de Lévy et fonction génératrice.
Caractérisation 2 : Dans $\mathbb{R}$, la fonction de répartition satisfait $F_n(x) \rightarrow F(x)$ pour tous les points de continuité de $F$.
Caractérisation 3 : Dans $\mathbb{R}$, $\forall < \epsilon$, il $\exists A, t.q \sup_{n} \mu_n(\mathbb{R} \backslash [-A,A]) < \epsilon$, alors il existe une sou-suite $\mu_{\phi({n})} \Rightarrow \mu$.


Convergence en loi sur l'espace métrique

Un première question est "est-ce que l'on peut généraliser ces résultats sur un espace plus abstrait ? " Comme on sait comment définir une fonction continuée bornée $ \mathcal{C}_b((E,d),\mathbb{R})$ sur un espace métrique (en fait, un espace T4 suffit), et on rappelle la construction de mesure sur un espace métrique : c'est après la représentation de théorème de Riez et les fonctionnelles de Radon positives, c'est aussi possible de parler la convergence sur un espace abstrait.

La mesure sur celui a beaucoup de régularité $$\begin{eqnarray*}\forall A \in \mathcal{B}(E), \mu(A) &=& \inf \{\mu (O), O \text{ ouvert }, A \subset O\} \\ &=& \sup\{\mu (F), F \text{ fermé }, F \subset A\}\end{eqnarray*}$$
On dit $\mu_n \Rightarrow \mu$ si $\forall f \in \mathcal{C}_c((E,d),\mathbb{R})$, $\mu_n(f) \rightarrow \mu(f)$.

Cette définition est exactement parallèle que celle dans l'espace $\mathbb{R}^d$. On énonce un théorème, assez générale et en fait, on l'utilise aussi dans la démonstration dans les résultats précédents.

Théorème ; (Définition équivalente)
  1. $\mu_n \Rightarrow \mu$.
  2. $\forall O$ ouvert, $\liminf_{n \rightarrow \infty} \mu_n(O) \geq \mu(O)$.
  3. $\forall F$ fermé, $\limsup_{n \rightarrow \infty} \mu_n(F) \leq \mu(F)$.
  4. $\forall A \in \mathcal{B}(E) $ si $\mu(\partial A) = 0$, on a $\lim_{n \rightarrow \infty} \mu_n(A) = \mu(A)$.
  5. $\forall f \text{ p.s } \mu $ bornée, $\lim_{n \rightarrow \infty} \mu_n(f) = \mu(f)$.

Espace polonais et métrisation de $\mathcal{M}_1(E)$

Quand on entre le domaine plus générale, c'est-à-dire le cas de l'espace topologique, on a besoin de travailler sur un espace un peu compliqué mais satisfait quand-même des propriétés similaires que celui dans l'espace $\mathbb{R}^d$. C'est l'espace polonais.
-
Définition : (Espace polonais)
Un espace $(E,d)$ polonais est un espace métrique, séparable et complet.

L'intérêt de cet espace est qu'il donne plus description sur la convergence en loi. En fait, la mesure comme un sous ensemble de fonctionnelle, c'est nature d'étudier la topologie sur cet espace $\mathcal{M}_1(E)$. En plus, le résultat suivant est vraiment important.

Proposition : (Topologie de $\mathcal{M}_1(E)$)
Soit $(E,d)$ un espace polonais,  alors on peut donner une distance sur $\mathcal{M}_1(E)$ tel qu'elle induit la même topologie étroite sur $\mathcal{M}_1(E)$. En plus, on peut réaliser que $\mathcal{M}_1(E)$ est un espace polonais sous cette distance.


C'est miracle. Une fois, on a une distance sur l'espace, les études sur $\mathcal{M}_1(E)$ est plus facile. La démonstration est assez intéressant. Grâce au théorème de Weierstrass-Stone, la fonction uniformément continuée est en fait séparable. Alors, on définit
$$
dist(\mu, \nu) = \sum_{n \geq  1}\frac{1}{2^n}|\mu(g_n) - \nu(g_n)| \wedge 1
$$
Cette distance réalise la même topologie et séparabilité. En revanche, la complétude, en fait, a rien à voir avec la topologie. Il y a des distances qui introduisent la même topologie mais avec différente complétude. La distance propre est plus compliquée qui s'appelle la distance de Lévy-Prokhorov.


Et puis, la convergence en lois de mesure est en effet un type de compacité sur $\mathcal{M}_1(E)$. La notation de tension et le théorème de Prokhorov seront utiles dans cette situation. Finalement, on rappelle le dernier résultat - la représentation de Shorokhod

Théorème de Shorokhod :
Soit $\mu_n \Rightarrow \mu$, alors il existe un espace $(\Omega,  \mathcal{F}, \mathbb{P})$ tel qu'il existe une suite de mesure $\nu_n$ ($\nu$)qui ont la même mesure que $\mu_n$($\mu$) mais $\nu_n \rightarrow  \nu$ p.s .

On sens vraiment, ce théorème va tricher dans quelques démonstrations.

2017年4月22日星期六

Rappel de probabilité (1) : classe monotone et construction de mesure

Il y a toujours des techniques et des points subtiles dans la théorie de probabilité. Dans cet article, on collectionne des petites techniques utilisées suivantes dans la théorie de probabilité, la théorie de mesure et le processus stochastique.

Le premier article est consacré pour la construction de mesure et l'unicité grâce au théorème de classe monotone. En anglais, il s'appelle aussi le lemma de $\pi-$système et $\lambda-$système.

Définition (Classe monotone)
$\mathcal{M} \subset \mathcal{P}(E)$ est dit une classe monotone si et seulement s'il satisfait les propriétés suivantes :

  1. $E \subset \mathcal{M}$
  2. Soit $A,B \in \mathcal{M}, A \subset B$, alors $B \backslash A \in \mathcal{M}$
  3. Soit $A_n \subset A_{n+1} \cdots, A_n \in \mathcal{M}$, alors $\bigcup_n A_n \in \mathcal{M}$ 



Lemma (Classe monotone)

Soit $\mathcal{C} \subset \mathcal{P}(E)$ et $\mathcal{C}$ est stable sous intersections finies, alors $\mathcal{M}(\mathcal{C}) = \sigma(\mathcal{C})$.
PF: On juste liste les idées clés dans la démonstration.

  1. Il suffit de montrer que $\mathcal{M}(\mathcal{C})$ est aussi stable sous intersections finies.
  2. On définit $\mathcal{M}_1(\mathcal{C}) = \{B \in \mathcal{M}(\mathcal{C}) | A \cap B \in \mathcal{M}(\mathcal{C})\}$  pour $A \in  \mathcal{C}$ et vérifie que $\mathcal{M}_1(\mathcal{C})$ est aussi une class monotone. Vu que $\mathcal{M}_1(\mathcal{C}) \subset \mathcal{M}(\mathcal{C})$, on obtient $\mathcal{M}_1(\mathcal{C}) = \mathcal{M}(\mathcal{C})$.
  3. On adapte la même stratégie pour $A \in \mathcal{M}(\mathcal{C})$ et montre que $\mathcal{M}(\mathcal{C})$ est stable sous intersections finies et on conclut. 


Théorème (Unicité de mesure)

Soient $\mu, \nu$ deux mesure sur $(E, \mathcal{A})$ et $\mathcal{C} \subset \mathcal{A}$ stable sous intersections finies et $\sigma(\mathcal{C}) = \mathcal{A}$. Si deux mesures coincident sur $\mathcal{C}$, une condition suivante assure que les deux sont identiques sur $\mathcal{A}$.

  1. $\mu(E) = \nu(E) < \infty$
  2. $\exists E_n \in \mathcal{C}, E_n \in E_{n+1}, E = \bigcup_n E_n, \mu(E_n) = \nu(E_n) < \infty$

PF: La grande ligne est de montrer que l'ensemble $$\mathcal{G} = \{A \in \mathcal{A} | \mu(A) = \nu(A)\}$$
est une classe monotone, qui implique que l'unicité peut s'extender sur $\mathcal{A}$.   La première condition est en fait suffisant pour une mesure de probabilité et le deuxième généralise ce théorème dans le cas mesure infinie approximation d'une suite de mesure finie définie comme $$\mu_n(\cdot) = \mu(\cdot \cap E_n)$$



2017年4月20日星期四

SLE (2) : Property of locality

One of the interest to study $SLE_{\kappa}$ is that it simulates the critical behaviors of a lot of random models such as self-avoiding random walk, the interface of Ising and uniform spanning tree. Here, we give an example of the locality property of $SLE_{6}$ and its connection of Ising model in critical case.

Site Ising model

We first introduce the Ising model. This is so called site Ising model that each site has $\frac{1}{2}$ probability to be black or white. We suppose that the negative axis is black and the positive axis is white and a process starts from $0$ who only expolres the  interface of the two. It has two properties:
  1. Conformal invariance. It is believed that after an conformal mapping, the process follow the same property.
  2. Domain Markov property. When the diameter of lattice goes to $0$, the process should be independent to the one already explored since the trace is determined locally.
One has a third property called locality : that a process in domain has the same law before it goes out of the domain, since also the trace is determined locally.

The conformal invariance and domain Markov implies that its scaling limit should be a $SLE_{\kappa}$.

$SLE_6$ and locality

But why this $SLE_{\kappa}$ should be $SLE_{6}$?

That is required by the property of locality. Generally, we can define the conformal  image of $SLE_{\kappa}$ in domain $D$,  but it doesn't follow the same law as a random process. For example, we define a mapping $$ \phi : D \rightarrow \mathbb{H} $$, then the equation becomes $$\partial_t \tilde{g}_t (z) = \frac{\partial_t hcap}{\partial_t \tilde{g}_t (z) - \tilde{U}_t}$$

Here $\tilde{U}_t$ is not necessarily a standard Brownian motion and $\kappa = 6$ is the case to make it as a real one.

We skip the technique calculus but just talks about its sense. It means that before going out of the boundary, the image $\phi(K_t)$ follows also a $SLE_6$ process. This one implies the property of locality. 


Cardy formula

The most interesting story about the $SLE_6$ is its behavior one equilateral triangle. We first calculate one probability of $SLE_6$ on $\mathbb{H}$ : $\mathbb{P}(\tau_{-y} < \tau_{1}) = F(\frac{y}{1+y}) $ where $$F(z) = c^{-1}\int_0^z \frac{1}{u^{\frac{2}{3}} (1-u)^{\frac{2}{3}}} du$$
which coincides with  the Schwartz-Christoffel formula in complex analysis and satisfies $$F(1+y) = 1 + e^{2\pi i / 3} F(\frac{y}{1+y})$$ which  sends $0, 1, \infty$ to $0, 1, e^{\pi i /3}$ respectively.

We construct an other conformal mapping $\Phi : \mathbb{H} \rightarrow \triangle$ which sends $0, 1, -y$ to $0, 1, e^{\pi i /3}$ respectively. We  can show in this case, $\Phi(\infty) = F(1+y)$. So we get the property that $$\mathbb{P}(\tau_{-y} < \tau_{1}) = \frac{|\Phi(\infty) \Phi(1)|}{|\Phi(-y) \Phi(1)|}$$ 

Using the locality of $SLE_{6}$, this interprets that a $SLE_{6}$ from $0$ to $\Phi(\infty)$ in $\triangle$, the first time to hit each side just has the probability of the length. A similar argument shows that the same process from $0$ to $1$ has the  uniform distribution for the first hit. 

More generally, two $SLE_{6}$ in the same triangle, one from $0$ to $1$ and another from $0$ to $e^{\pi i/3}$, share the same law before until the first hit one the opposite side. 

2017年4月17日星期一

Hard sphere model (1) : Introduction

The hard sphere model starts from the very basic assumption of classical mechanics and attempts to prove the convergence from microscopic model to macroscopic model.

Notation and basic assumption

We starts from $N$ particles following the rules of classical mechanics and each one can be designed $z_i = (x_i, v_i) \in \mathbb{T}^d \times \mathbb{R}^d, i = 1, 2, 3, \cdots N$. After one collision, the change of speed is expressed as
$$
\begin{eqnarray*}
v' &=& v - ((v - v_1)  \cdot r) r \\
v'_1 &=& v_1 - ((v_1 - v) \cdot  r) r
\end{eqnarray*}
$$

The mechanics can be reversed once we change the direction of time and speed, in another word, this system is reversible. To avoid the  abuse of notation, we always define the evolution of system in a suitable subset of all configuration like
$$
D^N_{\epsilon} = \{(z_1, z_2, \cdots z_N), \forall i \neq j, |x_i - x_j| < \epsilon \}
$$
We define the density of the configuration in a produit probability space on $\mathbb{T}^d \times \mathbb{R}^d$. Some analysis can deduce that : We can define all the configuration on the probability space except a null set.

M_{\beta}^{\otimes N}(V_N)

Boltzmann-Grad scaling and Boltzmann equation

For a single particle, its density has an invariant law following the Liouville equation $$ \partial_t f(t,x) + v \cdot \partial_x f(t,x) = 0 $$ 

We generalize this equation to $N$ hard sphere model, then the Liouville equation is 

$$ \partial_t f_N(t,Z_N) + \sum_{i = 1}^N v_i \cdot \partial_{x_i} f_N(t,Z_N) = 0 $$ 

Since we would like to study the particle chaotic phenomena, a nature idea is to calculate its invariant measure - we know the limit measure is always the invariant measure. In this case, the invariant measure is called Boltzmann measure which is 

$$M_{N, \beta} = \frac{1}{\mathcal{Z}_N} \mathbb{1}_{D^{N}_{\epsilon}}(Z_N) M_{\beta}^{\otimes N}(V_N)$$ 

this one is not exactly a Maxwell-Boltzmann distribution in cause of the exclusion condition. But some detailed analysis tells us that we can do "almost" factorization of the measure to the product of the probability.  

$$|(M_{N, \beta}^{(s)} - M_{\beta}^{\otimes s}) \mathbb{1}_{D^s_{\epsilon}}| \leq C^s \rho M_{\beta}^{\otimes s}$$

In the further blog, we will study the convergence in different context precisely. We will show that, the distribution is very near to the distribution of the solution of linear Boltzmann equation $\partial_t f + v \cdot \nabla_x f = \alpha Q(f,f)$.

Entropy paradox

Entropy introduced by Shannon and Boltzmann is a genius idea to describe the disorder degree as $H(t) = \int_{D^N_{\epsilon}}  - f_N(t, Z_N) \log{f(t,Z_N)} dZ_N$. According to this definition, the entropy of the solution of hard sphere model is constant, but that of the solution of Boltzmann equation will increase. 

In our course, we list two models to explain this paradox. The two baby model like Kac ring model and Ehrenfest model are all reversible, so they have a period or Poincare recurrence result. However, the time to appear the chaos is faster than the period. That is also the case in hard sphere model : we will see the recurrence perhaps, but it takes long long long time and during this time, the chaos repeat again and again. So how do you regard it? A deterministic system or a random system? You got the answer.

2017年4月3日星期一

Gumbel distribution

A very short note for the Gumbel distribution.

$X_i$ i.i.d and have distribution of $\mathcal{Exp}(1)$, then $$\max_{1 \leq i \leq n}X_i - \log{n} \xrightarrow{(d)} G$$
where G represents a Gumbel distribution.

The proof is easy. We just calculate the distribution of this random variable
$$
\mathbb{P}[\max_{1 \leq i \leq n}X_i - \log{n} < y] = \mathbb{P}[\max_{1 \leq i \leq n}X_i  < y + \log{n}] = (1 - e^{- (\log{n} + y)})^n \rightarrow e^{-e^{-y}}
$$
So we conclude the convergence in distribution.

2017年4月2日星期日

Percolation (1) : 2D Bernoulli percolation, critical point and transition of phase

Recently I attend the seminar talking about the percolation theory in IHES so I wrote some notes here.

Before starting, we have to notice that the terminology of percolation is adopted in different situations and generally in the contexts of graphs - it could be various graphs - like lattice $Z^d$, random graphs, random maps etc. But the phenomena is a little universal that a path starts from $0$ and goes to infinite faraway. In this series of talk, Hugo focuses on the situation of 2D Bernoulli percolation.

Definition and critical point

We give some definitions precisely. In the lattice graph $Z^2$, each site has 4 neighbors and every edge has independently a probability $p$ to be present (open) or a probability $1-p$ to be absent (closed). Then there exits a critical point $p_c$ : when $p \leq p_c$, with probability $0$ there exists a path from 0 to $\infty$, while when $p > p_c$, with probability $\theta(p)$ there exists a path from $0$ to $\infty$.

$$
\begin{eqnarray*}
p \leq p_c &,& \theta(p) = \mathbb{P}[0 \leftrightarrow \infty] = 0 \\
p > p_c &,& \theta(p) = \mathbb{P}[0 \leftrightarrow \infty] > 0
\end{eqnarray*}
$$

In 2D model, the critical point $p_c = 1/2$. Some simple argument supports this point. For example, if we draw the dual percolation between the face whose frontier is closed, we get a dual graph with probability $1-p$. If we suppose that the critical point is unique, then the transition of face happens at the same time in both the primal graph and the dual graph. So $p_c = 1- p_c$ and $p_c = 1/2$.


Quantitative analysis 

We hope to get some stronger result, The following theorem is first obtained by Menshikov, Aizeman and Michael $$\begin{eqnarray*}\forall p < p_c, \exists C_p > 0 \text{ s.t } \mathbb{P}_p[ 0 \leftrightarrow \partial B_n] \leq \exp{(-C_p n)}\\ \forall p > p_c, \exists C > 0 \text{ s.t } \mathbb{P}_p[0 \leftrightarrow \infty] \geq C(p - p_c) \end{eqnarray*}$$

Here, we denote the ball of radius $n$ by $\partial B_n$ and this theorem indeed, gives some numerical estimation of the speed of decrements. 

Proof 1 by (Menshikov, Aizeman, Michael) 

We note $\theta_n(p) = \mathbb{P}_p [0 \leftrightarrow \partial B_n]$ and $$\phi_p (S) = \sum_{x \in S, y \notin S, x \sim y} p\mathbb{P}[0 \leftrightarrow^S x]$$. We can prove it by 5 steps.

  1.  We admit at first this important inequality$$ \frac{d}{dp} \theta_n(p) \geq \frac{1}{p(1-p)}[\inf_{0 \in S \subset B_n} \phi_p(S)](1 - \theta_n(p))$$
    then we define $$\tilde{p}_c = \sup \{p : \exists S \ni 0, \text{s.t} \phi_p(S) < 1\}$$
    we can prove $$\theta(P) \geq \frac{1}{p(1-\tilde{p}_c)}(p - \tilde{p}_c)$$
  2.  We choose $S \subset B_{k-1}$ such that $\phi_p (S) < 1$ and prove$$\theta_{nk}(p) \leq (\phi_p(S))^n$$
  3. We conclude that $\tilde{p}_c = p_c$
  4. Verify the identity $$\frac{d}{dp}\mathbb{P}_p(X) = \sum_{e \in E} \frac{1}{p(1-p)}Cov(w_e, X)$$
  5. We put $X = -\mathbb{I}_{0 ! \leftrightarrow \partial B_n}$ and we prove the important inequality.


2017年3月19日星期日

An Olympiad inequality Iran96

One day, Xiaoke asks me an Olympiad inequality, which recalls me of a lot of beautiful memories.

For any $a,b,c \in \mathbb{R}$, try to prove$$(ab + bc + ca)(\frac{1}{(a+b)^2} + \frac{1}{(b+c)^2} + \frac{1}{(c+a)^2}) \geq \frac{9}{4}$$

This is a very hard inequality named "Iran96". In the WeChat post, it is said that this one is so hard that many students who participates in competitions cannot solve  it. I believe that it is a little exaggerated. Even for a Olympiad question, this is not the most difficult one.

Let us do some transform.

$$LHS = 4\sum [a^5b + b^5 a + 2a^4 b^2 + 2 a^2 b^4 + 5  a^4 bc + 3 b^3 c^3 + 13 a^3  b^2 c + 13  a^3 b c^2 + 8  a^2b^2c^2]$$
$$RHS = 90a^2b^2c^2 + 9\sum[a^4b^2 + a^2b^4 + 2a^3b^3 + 6a^3b^2c + a^2b^3c + 2 a^4bc]$$

We do some simplification and finally we have to prove this inequality$$\sum[4 a^5b + 4ab^5 + 2a^4bc + 2a^2b^2c^2] \geq \sum[a^4b^2 + a^2b^4 + 6b^3c^3 + 2a^3b^2c + 2a^3bc^2]$$

It reduces to three inequalities$$\begin{eqnarray*}3\sum(a^5b + ab^5) &\geq& 3\sum a^3b^3\\\sum(a^5b + ab^5) &\geq& \sum(a^4b^2 + a^2 b^4)\\2abc(\sum a^3 + 3abc) &\geq& 2abc\sum(a^2b + ab^2)\end{eqnarray*}$$

The last one is a very famous inequality called inequality of Schur.

2017年3月13日星期一

Wright-Fisher model and Kingman Coalescence

In the last course of ecology and model of probability at Polytechnique, we talk about Wright-Fisher model and Kingman coalescence, two models used for the simulation of the genes of human beings. The former is easy to understand and the latter, relates the theory of combinatoire, provides some very interesting formulas.

Wright-Fisher model

Suppose that in the population there exists two types of genes A and a, then we denote $X_n$ the number of A in the population, whose size is always $N$. Then the evolution is a Markov chain and the transition matrix is a binomial type
$$\mathbb{P}(X^N_{n+1} =  k| X^N_{n} = i) = C_N^k \left(\frac{i}{N}\right)^k \left(1 - \frac{i}{N}\right)^{N-i}$$
I
It is easy to check that $X^N_n$ is a martingale and its $L^2$ norm is bounded and the Markov chain is positive recurrent, so
$$
X^N_n \xrightarrow[a.s]{n \rightarrow \infty} X^N_{\infty} \in \{0, N\}
$$
Using the theorem of stopping time we get that $\mathbb{P}(X^N_{\infty} = N) = \frac{i}{N}$ where $i$ is the initial state.

Some other version can also be developped like the model with mutation and selection. On another hand,  if we change the scale of time like $Z_t = \frac{1}{N}X^N_{[Nt]}$, the convergence of trajectoire implies that
$$
Z_t = Z_0 + \int_0^t \sqrt{Z_s (1 - Z_s)} dB_s
$$
which relates a discrete model with a continuous random process.


Kingman coalescence model


The state is defined on the partition of $[1,N]$ and the initial state is $X_0 = \{1\}\{2\}\cdots\{N\}$. We note $T_i$ the i-th jump time which follows the law $\mathcal{Exp}(\frac{(N-i)(N-i-1)}{2})$ and choose uniformly two block to make one block. Some obvious properties are observed.

1. Each time, the number of block minus one.

2. The expectation of fusion time is $\sum_{i=1}^{N-1} \frac{2}{(N-i)(N-i-1)} = 2(1 - \frac{1}{N-1})$ and it converges.

3. We define the genetic relation $\pi' \rightarrow \pi$ if the later is the generated by fusing two blocks of the former. The transition matrix is 
$$\mathbb{P}(\pi', \pi) = \frac{2}{\sharp \pi' \cdot (\sharp \pi' -1)} \mathbb{I}_{\pi' \rightarrow \pi}$$

However, the most beautiful formule is 
$$\mathbb{P}_{T_{N-k}}(\pi) = \frac{k !}{N !} \frac{(N-k)!(k - 1)!}{(N-1)!} \prod_{i = 1}^l \sharp B_i$$
for $\pi = \bigsqcup_{i=1}^k \{B_i\}$

The proof is just  a recurrence but the structure of formula is really beautiful, isn't?

2017年3月9日星期四

MMB (2) : Convergence of Galton-Watson

Before talking about the martingale theory of MMB, we prefer talking more about the model MAB and Galton-Watson model, the most classical model. We will talk about the longtime  behavior of the population.

Martingale in Galtton-Watson

We note $m$ as the expectation of the production. That is
$$
m = \sum_{k=1}^{\infty} k p_k
$$
A simple retira of extinction is that if $m \leq 1$, almost surely, the population will die. But if $m > 1$, the population has a probability to survive and this probability is the smaller fixed point of the characteristic function.
$$
q = \phi(q)
$$

If we would like to say more about the longtime behavior, in the sub-critical case, we have quasi-stationary theory, which is another story of my EA project. In the sup-critical case, we know that
$$
W_n = \frac{Z_n}{m^n}
$$
is a martingale, where $Z_n$ is the number of the population of each generation. The positivity means the convergence of martingale
$$
W_n \xrightarrow[p.s]{n \rightarrow \infty} W_{\infty}
$$
However, a natural question is in which case, we have U.I convergence or $L^1$ convergence. It has other sense in aspect of change of probability and we will study it later. Some special case is that when $\mathbb{E}(Z^2_1)  <  \infty$, then we have
$$
\begin{eqnarray*}
\mathbb{E}(W^2_{n+1}) & \leq &\mathbb{E}(W^2_{n}) + \frac{Var(Z_1)}{m^{n+2}} \\
\Rightarrow \sup_{n} \mathbb{E}(W^2_n) & < &\infty
\end{eqnarray*}
$$
Then we have a convergence in the sense $L^2$ and this implies the convergence in the sense $L^1$.

Generally, use the property of branch, we obtain that $\mathbb{P}(W_{\infty} = 0)$ is also a solution of $\phi(q) = q$. However, we have no idea that it is the smaller one. But in the case $\mathbb{E}(W_{\infty}) = 1$, we know that the in the sup-critical case it is not $1$. Then we get that
$$
\mathbb{P}(W_{\infty} = 0) =  \mathbb{P}(\text{Extinction})
$$
so conditionally non-extinction, we an exponential increment, except that we don't know the constant.

Then exact equivalent condition of convergence will be discussed at last and we just discuss the Galton-Watson with immigration in the second part.

Galton-Watson with immigration

In the model of Galton-Watson with immigration, each step we have not only the reproduction but also the immigration of population. We can of course couple $Z_n$ with a classical Galton-Watson $X_n$, then whether $\frac{Z_n}{m^n}$ converge depends on the intrgration  of $log(Y_1)$. 

We skip the technique  part, the result is that $$\begin{eqnarray*}\mathbb{E}(\log{Y_1}^+) < \infty \Rightarrow \lim_{n \rightarrow \infty} \frac{Z_n}{m^n} = c < \infty \\ \mathbb{E}(\log{Y_1}^+) = \infty \Rightarrow \lim_{n \rightarrow \infty} \frac{Z_n}{m^n} =  \infty\end{eqnarray*}$$

Change of probability and Kesten-Stigum theorem

The Kesten-Stigum theorem is as following.
\begin{eqnarray}
\mathbb{E}(W_{\infty}) &=& 1 \\
\mathbb{P}(W_{\infty} > 0 | \text{non-extinction}) &=& 1  \\
\mathbb{E}(Z_1 (\log{Z_1})^+) &<& \infty \\
\end{eqnarray}

The idea is very technique. To prove the  $\mathbb{E}(W_{\infty}) = 1$, we transform the problem to the change of probability. Then it is a biased Galton-Watson model, which can also be considered as a model with immigration. The we apply the result of the model of immigration. Technique part need the decomposition of measure and the change of probability in filtration.


--------------------------------------------------------------------------------------------------------------------------
Some remarks after two one years.

In 2017, I didn't understand all the part of this story, although the above gives almost all the points. The Kesten-Stigum theorem is important since it studies the the martingale of the Galton-Watson process. The critical and sub-critical case are easy : extinction. However, the super-critical case still has two cases : extinction or an exponential growth. The theorem tells us if there is no extinction, it is an exponential growth.

Secondly, to study the $W_n = \frac{Z_n}{m^n}$, we treat it as a change of probability i.e the probability $\mathbb{Q}$ of Galton-Watson process with immigration. We know that two measures can be decomposed into absolute continuous part and singular part. Moreover, in martingale case, it is that the part $\{W_{\infty} = \infty\}$ makes sense. The infinite part could not be seen as the part absolutely continuous, so we have to treat it specially.

Then the part of lemma Seneta is always technical. In fact, the number of individual of immigration contributes with a discount ration in the population. The criteria of $\log_+ Y$ often appears in the case of random environment.

2017年2月28日星期二

SLE (1) : A magical random evolution

When I came to France, I have spent longtime thinking about my future and the research field when I stayed in language school. One day, I read a introductory article which states a theorem that

"The Hausdorff dimension of the frontier of Brownian motion is  $\frac{4}{3}$"



I know the definition of Hausdorff dimension. A dimension mesures the fractal object, a good definition but very hard to calculate in maths. Usually, the mathematician gives its upper bound and lower bound but no exact value. How we reach it?

Some further search tells me a word - Schramm-Loewner Evolution, a magical random evolution relates many different models in maths and physics, especially those with fractal structure.

"Yes, it is the maths I want." I told myself and I begins the journey to understand it.

What is SLE

In short, SLE gives us a generally method to define a growing random set $K_t$, which can be considered as the scaling limit of some other random model, such as the interface of the Ising model, the frontier of the Browmian motion, and the limit of uniform spanning tree etc. 

More surprisingly, the description of the random compact set depends only on an equation - Yes, it is the most successful method ever existed for mathematicians and physicians to study the natural phenomena, and moreover, SLE relates the complex analysis and stochastic analysis together, so it takes advantages of a lot of theorems in both these fields. 

But we have to say, there exists a lot of open problems to study, since our nature is so complicated and the physical or biological models are difficult and specific enough - we have to spend a longtime understanding them.

How to define a growing compact set

The classical complex analysis studies conformal mapping and the Riemann mapping theorem tells us there is unique mapping between two domain such that the one point is fixed an the distortion at this point is also fixed. We denote $\mathbb{H}$ the half-upper plane. For a compact set $K$ such that $\mathbb{H} \backslash K$ is simple connected, we have a conformal mapping

$$
\Phi : \mathbb{H} \backslash K \rightarrow \mathbb{H}
$$

We make a linear transform (Hydrodynamic normalisation) such that the $\Phi$ has a analytic development near infinity
$$
\Phi(z) = z + \frac{2a(K)}{z} + o(\frac{1}{z^2}) \dots
$$

We remark that this mapping $\Phi$ exists using Schwartz reflection theorem and is the only one such that
$$
\| \Phi(z) - z \| \rightarrow 0 \text{ as } z \rightarrow \infty
$$

Here, $a(K)$ is called the capacity of $K$ since it measures how big $K$ is, An interesting property is that if we throw a 2-D Brownian motion starting from $Z_0 = iy$ and let $\tau$ be the exiting time of $\mathbb{H} \backslash K$, then
$$
2a = \lim_{y \rightarrow + \infty} y\mathbb{E}[Im(Y_{\tau})]
$$
this means that the capacity is a real positive number.

There is a lot of properties about this maps, such as the composition of map makes just makes the sum of capacity and the scaling property.
$$
\begin{eqnarray*}
a(\Phi_1 \circ \Phi_2) &=& a(\Phi_1) + a(\Phi_2)\\
a(\lambda K) &=& \lambda^2 a(K)\\
\end{eqnarray*}
$$

We would like this application be dynamical - that is to say we would like to define a family of mapping $g_t$ which corresponds to the standard mapply from
$$
\mathbb{H} \backslash K_t \rightarrow \mathbb{H}
$$
Obviously, this time, the series of compact set $K_t$ should have some condition. In maths, it requires that $K_t$ grows locally slowly, monotone and have good paramatrization $a(K_t) = t$. We can prove that, in this case, the growing random compact set can be characterized  by a ODE. - Loewner equation
$$
\partial_t g_t(z) = \frac{2}{g_t(z) - U_t}
$$
The ODE is well define if only there is no sigularity. We call $w_t$ the driven function and we know at the end of the lifetime $\tau$
$$
g_{\tau}(z) = U_{\tau}
$$
otherwise, we can always extend our solution.

In fact, we can treat $g_t(z)$ in two ways. First, we fix $z$, then $g_t(z)$ is a solution of ODE and we get the value of time t. Second, we fix t, then $g_t(z)$ becomes a conformal mapping. Generally, the first one is easier to get calculate the value, but if we would like to get the set $K_t$, the second is more intuitive. $K_t$ is the $z$ such that well define until the time $t$. Formally,
$$
K_t = \mathbb{H} \backslash g_t^{-1}(\mathbb{H})
$$
We have also the analytic serise
$$
g_t(z) = z + \frac{2t}{z} + o(\frac{1}{z})
$$

Some connection between harmonic function and BM is known for longtime, such as the law of BM is same after a normalized conformal mapping. The connection between this equation and probability theory is to make the driven funciton $w_t$ a random process like BM. We will states it in the next section.


Chordal SLE

We studies at first one kind of SLE which starts at 0 and walks always on the half-plan $\mathbb{H}$. We define $SLE_{\kappa}$ as following.

$$
\begin{eqnarray*}
\partial_t g_t(z) &=& \frac{2}{g_t(z) - U_t}\\
g_0(z) &=& z\\
U_t &=& \sqrt{\kappa}B_t
\end{eqnarray*}
$$

Generally, what makes different is that the driven function is a Brownian motion. But we know that the Brownian motion has some universality  in certain sense. We list some most basic properties that make the chordal $SLE_{\kappa}$ different. We recall that the random object here is the compact set $K_t$ and the function aims to help us understand the random compact set.

  1. Markov on domain. Let $T$ be a stopping time, then
    $$
    \begin{eqnarray*}
    g_T(K_{T+t} \backslash K_T) - U_T & \perp & \mathcal{F_T}\\
    g_T(K_{T+t} \backslash K_T) - U_T & \sim^{d} & K_t\\
    \end{eqnarray*}
    $$
  2.  Scaling invariace
    $$
    \frac{1}{\sqrt{\lambda}}K_{\lambda t} \sim^{d} K_t
    $$
  3.  Symmetry.
    $$
    -K_t \sim^{d} K_t
    $$

We can compare these three properties with the basic properties of Brownian motion. There are just totally parallel, That is why the researcher now consider SLE as a basic random object in dimension 2 as Brownian motion.

However, one would like to know why the driving function must be a Brownian motion. In fact, in many statistical physics, it requires a conformal Markov property.
$$
\text{ Conformal Markov preperty } = \text{ Markov on domain } + \text{ Scaling invariance }
$$
In this case, we come back to see that the driving function should be stationary, independant increment and scaling invariant, so the only choice is Brownian motion.

Phase transition

The phase transition is a very interesting topic in chordal $SLE_{\kappa}$. A baby version is to consider how the $K_t$ will eat the axis. The theorem is 
  • If $\kappa \leq 4$, a.s $\bigcup_{t \geq 0} K_t \bigcap \mathbb{R} = \{0\}$
  • If $\kappa > 4$, a.s $\mathbb{R} \subset \bigcup_{t \geq 0} K_t$
The main idea is to write $X_t = \frac{g_t(1) - U_t}{\sqrt(\kappa)}$ then this problem transforms to  a problem of Bessel process and we get the result wanted.

The proof that the $SLE_{\kappa}$ is generated by a curve is more difficult, but if we admit this property, using the Markov property that $g_t(K_{t+s}) - U_t \sim^{d} \tilde{K}_s$, then when $\kappa \leq 4$, the curve after $g_t(K_{t+s}) - U_t$ will not touch the axis, which means that it will not intersect itself and the curve is a simple curve. This is a very interesting result. 

2017年2月1日星期三

MMB (1) : Large deviation

This term, I take two courses M2 in Paris  Orsay and Polytechnique respectively in order to enrich my knowledge in probability. Today, Pascal talks about the large deviation theory.

We know the central limit theory, that is for $\{X_i\}$ i.i.d with finite variance
$$
\sqrt{M} (\bar{X}_M - \mathbb{E}[X] ) \Rightarrow \mathcal{N}(0, Var(X))
$$
However, this gives only the estimation in gap $\sigma$, but what happens for the distribution in large distance from the mean?

This requires the tool of large deviation estimation, This is a basic tool in mathematics and is used every in probability and statistics. For probabilistes, large deviation gives the probability of the events atypical and sometimes the correlation function estimation. For statisticiens, this gives the interval of confiance non-asymptotic. In a word, this is a necessary tool.

Inequality of Chernoff

We start from the typical Markov inequality
$$
\mathbb{P}\left[S_n / n - \mathbb{E}[X] > x\right] = \mathbb{P}[e^{\theta( {S_n}/{n} - \mathbb{E}[X])} > e^{\theta x}]
$$
So we get
$$
\mathbb{P}\left[S_n / n - \mathbb{E}[X] > x\right] \leq e^{- n I(x)}
$$
where we define
$$\begin{eqnarray}
\phi(\theta) &=& \log \mathbb{E}[e^{\theta X}] \\
I(x) &=& \sup_{\theta \in R} \theta x - \phi(\theta)
\end{eqnarray}$$
However, from this simple inequality, a lot technique is developed.


Inequality of Hoeffding

We can give a better estimation for the case $X$ is bounded in $[a,b]$, that is 
$$
\mathbb{P}\left[S_n / n - \mathbb{E}[X] > x\right] \leq \exp(- \frac{2 x^2}{n(b-a)^2})
$$
Idea is to develop the function $\phi$ in 0 and then give a good estimation.

Some generalized version is also possible. For exemple, we can consider not only one function, but a family of function - in another word, a dictionary. The more general theorem depends largely on the theory of covering, or approximation.

Inequality of for Gaussian

However, a big obstacle of the inequality of Hoeffding is that the condition of bound. How to treat the unbounded function, for example, Gaussian, a large class of function?

An idea is a the inequality of type entropy. The entropy of a function under the mesure $\mu$ is to define 
$$ Ent_{\mu}(f)  = \mathbb{E}(f  \log{(f)}) - \mathbb{E}(f) \log{(\mathbb{E}(f))}$$
In some case, the mesure $\mu$ verifies the inequality of log-Soblev such that
$$ Ent_{\mu}(f^2) \leq C_{\mu} \mathbb{E}_{\mu}(|\nabla f|^2) $$
In this case we can deduce an inequality
$$ \mathbb{P}(|f(Y) - \mathbb{E}(f(Y))| > \epsilon) \leq 2 \exp{(-\frac{\epsilon^2}{C_{\mu}|f|^2_{Lip}})}$$ 
It is the type of inequality of large deviation. A natural question is whether the mesure verifies the inequality of log-Soblev. It requires analysis and the answer for Gaussian is positive. However, a general case is just one branch of research.

Theorem of Gramer

A more general principle is the theorem of Gramer, which gives not only the upper bound but also the lower bound of a distribution.
$$\begin{eqnarray}
-\inf_{x \in \Gamma^{O}}I(x)
\leq \liminf_{n \rightarrow \infty} \frac{1}{n} \log \mathbb{P}\left[S_n / n \in \Gamma \right] \\
\leq \limsup_{n \rightarrow \infty} \frac{1}{n} \log \mathbb{P}\left[S_n / n \in \Gamma \right]
\leq -\inf_{x \in \Gamma^{F}}I(x)
\end{eqnarray}$$

In the course, we analyse in detail some properties of the function $\phi(\theta)$ and $I(x)$, like their convexity, zeros and monotony. The upper side is also like the Chernoff upper bound, while the left lower bound use the change of probability to prove it.