2023年1月3日星期二

Dini's Lemma

In our last project, we use Dini's lemma in one step of proof. This result, although should be part of undergraduate analysis, is really like magic. It says "the monotone convergence of continuous function $\{f_n\}_{n \in \mathbb{N}}$ to a continuous limit $f$ will give us locally uniform convergence. "

The monotone convergence is the key. For the case of decreasing sequence, we fix $N$, then for all $n > N$, then $f_n(y) - f(y) \leq f_N(y) - f(y)$. We then apply the classcial triangle inequality for a $\delta$-neighbor of $x$ for both $f$ and $f_N$ that 
$$f_n(y) - f(y) \leq f_N(y) - f(y) \leq \vert f_N(y) - f_N(x)\vert  + \vert f_N(x) - f(x)\vert + \vert f(x) - f(y)\vert. $$
Then only the information of $f_N$ and $f$ near $x$ can control all the convergence of sequence $n \geq N$.

2021年8月22日星期日

Connection probability on oriented percolation does not depend on root

There is a long time that I have not updated my blog. This time, I would like to talk about once again a question in Alibaba competition (2021) as I did not solve it that day. 

Question: On a general connected graph $G=(V, E)$, every bond contains two oriented edges. Then we pose an oriented percolation model on it. We choose a vertex as root, then try to prove that the probability to have oriented path from every vertex to root is independent to the choice of root.

It is natural to think of a bijection, but it is not easy. During the exam, I only treat the very simple case that $G$ is a tree, which is very easy. However, it is still useful because we will see an argument to combine the two.

We denote by $h(a) = \mathbb P_p[G \to a]$ the connection probability. It suffices to prove that $h(a) = h(b)$ for every $a \sim b$. We prove it by induction on the number of bonds $\vert E \vert$. Then the basic case is the tree and we have already proved it and we need to finish the induction step.

For every configuration $\omega$, we define $$Pivot(\omega, a, \overrightarrow{ba}) := \{v \in V: v \to a \text{ when } \omega(\overrightarrow{ba}) = 1, v \nrightarrow a \text{ otherwise}\}.$$
Then we treat two cases:

Case 1: $\omega \in \{G \to a\}, Pivot(\omega, a, \overrightarrow{ba}) = \emptyset$. For this case, $\omega \setminus \overrightarrow{ba}$ can be thought as a connection configuration in $(V, E\setminus \{a,b\})$. Then by induction, for this case the probability does not depend on the choice of $a,b$ by adding an oriented edge and we can flip it.

Case 2: $\omega \in \{G \to a\}, Pivot(\omega, a, \overrightarrow{ba}) \neq \emptyset$. Then this implies that $\omega(\overrightarrow{ba}) = 1$ and $b \in Pivot(\omega, a, \overrightarrow{ba})$. We should add one more restriction that $a \nrightarrow b$, otherwise it is the common part for two. Then we flip the orientation  $\overrightarrow{ba}$, and one can check that 
$$\mathbb P_p[\omega \in \{G \to a\} \cap \{a \nrightarrow b\}, b \in Pivot(\omega, a, \overrightarrow{ba}) ] \\= \mathbb P_p[\omega \in \{G \to b\} \cap \{b \nrightarrow a\}, a \in Pivot(\omega, b, \overrightarrow{ab}) ].$$

With these arguments, we finish the step of induction and establish the result for the general graph $G$.

2020年11月9日星期一

A small theorem of Boolean approximation

I would like to record one theorem learned in the Alibaba competition this year. In fact, this problem has also appeared in the maths competition at Peking University for high school students. Let $(a_i)_{1 \leq i \leq n}$ be positive sequence such that $\sum_{i=1}^n a_i = a$ and $\sum_{i=1}^n (a_i)^2 = 1$, then we can always find a configuration $\epsilon_i \in \{\pm 1\}$ such that $$\vert \sum_{i=1}^n \epsilon_i a_i \vert  \leq \frac{1}{a}.$$

Let us see what this theorem tells us. From AG inequality, we know that if every number is equal, $a = \sqrt{n}$ and every $a_i = \frac{1}{\sqrt{n}}$. Then by a choice of the best Boolean approximation $\sum_{i=1}^n \epsilon_i a_i $, one can get a number very close to the $0$ with an error $\frac{1}{\sqrt{n}}$ --- that is the error of one term.

This makes us think of the concentration inequality in the probability --- like Markov inequality, Hoeffding inequality etc. That is the case when I did Alibaba competition, but I have to say this is a very dangerous trap in this question. In fact, even after the competition, I continue trying this idea many times, but it seems no easy way to figure it out. In fact, let us recall what concentration inequality teaches us: yes, if we put $\epsilon_i$ centered variable, the measure should be concentrated and $$\mathbb{P}[\vert \sum_{i=1}^n \epsilon_i a_i \vert  \geq \frac{1}{a} ] \leq 2\exp(-2/a^2)$$.
Good, you will see an explosion and this does not give useful information. Indeed, the concentration inequality always told us the measure should be in the $\sigma$ region, but it tells nothing how good the measure is concentrated in the $\sigma$ region. In this question, a random choice is clearly not good because the $\sigma$ for $\sum_{i=1}^n \epsilon_i a_i $ is $1$.

One has to keep in mind that the probabilistic method is good and cool, but not the only way and sometimes not the best way.

A simple solution is just by induction and one step exploration, or someone calls it the greedy method. This problem is equivalent to prove $$\left(\min_{\epsilon_i \in \{\pm\}} \vert \sum_{i=1}^n \epsilon_i a_i \vert  \right) (\sum_{i=1}^n a_i) \leq \sum_{i=1}^n (a_i)^2.$$ We do at first the optimization for $n$ variable and then choose the sign for the last one. We can also manipulate the choice of the last variable. For example, one can let $a_{n+1}$ always be the smallest one, thus its influence is always smaller than $\frac{1}{a}$. Once we get the correct direction, the theorem is not difficult.

2020年7月30日星期四

Variation argument is everywhere

Today I am asked a high school level question: given $1 \geq p_1 \geq p_2 \cdots p_n \geq 0$ and find a subset of these number such that to maximize the quantity
$$F(S) = \sum_{i=1}^{|S|}\frac{p_{\alpha_i}}{1 - p_{\alpha_i}} \left( \prod_{j=1}^{|S|}(1-p_{\alpha_j})\right).$$
If one would like to use the search naively, the complexity will be of course large as $n!$. I want to say a simple variational argument is although simple but useful. But supposing adding one more element, deleting one element, or replacing one element, one can see quickly the description for this optimal subset is $\tau$ that
$$S := \{p_i\}_{1 \leq i \leq \tau}, \tau := \min \left\{n \in \mathbb{N}: \sum_{i=1}^\tau \frac{p_i}{1-p_i} \geq 1\right\}.$$
Thus the complexity is reduced to $O(n)$. So let us always think about the variational argument.

2020年7月5日星期日

Optimization from random constrains

This week I heard a very nice talk from my classmate Huajie Qian. The question can be stated as doing $\min_{x}c \cdot x$ under the constrains $h(x, \zeta) \leq 0$ where $\zeta$ is a random variable. A very naive method is to say: we have data $\{\zeta_i\}_{1 \leq i \leq n}$, and then we do the problem so that these $n$ data is satisfied. Finally it gives the nice estimate that 
$$\mathbb P_{\{\zeta_i\}_{1 \leq i \leq n}}\left[\mathbb P_{\zeta}(h(x,\zeta) \leq 0) \leq 1- \alpha\right] \leq {n \choose d}(1-\alpha)^{n-d},$$
which is a very nice result.

The key to prove the result above depends on a very nice lemma: in fact, one can find $d$ constrains among the $n$ to solve the problem. That is to say: only $d$ constrains concern. 

To prove it, we need two theorems: Radon's theorem and Helly's theorem. The first one says in the space $\mathbb R^d$, for $(d+2)$ points, we can always find a partition $I_1, I_2$ so that the convex hull of $I_1, I_2$ has non-empty intersection. The second one says for convex sets $\{C_i\}_{1 \leq i \leq m}$, if the intersection of any $(d+1)$ has non-empty intersection, then $\cap_{i \leq m} C_i$ is non-empty.

Using the two theorems, we suppose that any $d$ constrains always give a strict better minimiser, then we do the projection on the hyperplane given by the direction $c$, and then apply the Helly's theorem on it to prove the contradiction.

2020年7月4日星期六

Maximal correlation

This is a question about the correlation. Let $X,Y$ be two Gaussion random variable $\mathcal{N}(0,1)$ with correlation $\beta$, then prove that the best constant of Cauchy inequality is 
$$Cov(f(X), g(Y)) \leq \vert \beta \vert \sqrt{Var(f(X)) Var(g(Y))}$$.

In fact, one can define the maximal correlation of random variable by the best constant above and of course it should be bigger than $\beta$. Let us remark how to prove the inequality above quickly. We can use the expansion by Hermit polynomial that we have 
$$\mathbb{E}[H_n(X) H_m(Y)] = \delta_{n,m} \left(\mathbb{E}\frac{1}{n!}[XY]\right)^n.$$
Then a centered $L^2$ functions have projection on $H_0$ zero. Then we have 
$$\mathbb{E}[f(X)g(Y)] = \sum_{n=1}^{\infty}\langle f, H_n \rangle \langle g, H_n \rangle \frac{1}{n!} \beta^n \\ \leq \vert \beta \vert \sqrt{\sum_{n=1}^{\infty}\langle f, H_n \rangle^2 \frac{1}{n!} } \sqrt{\sum_{n=1}^{\infty}\langle g, H_n \rangle^2 \frac{1}{n!} } \\ \leq \beta \vert \sqrt{Var(f(X)) Var(g(Y))}.$$
This concludes the proof.

2020年6月26日星期五

Strong Markov property for Brownian Bridge

The strong Markov property for Brownian motion is well known and it is also naturally true for the Brownian Bridge. In fact, since Brownian Bridge ended at $(T, 0)$ is thought as a Brownian Motion conditionned the end point, or thought as a perturbation for a linear interpolation, it is natural when we restart from another mid-point.

To prove it rigorously, for example, the Markov property for the Brownian Bridge, we have to do some calculus. Let $(W_t)_{t \geq 0}$ be the standard Brownian motion issued from $0$, and we have $(B_t)_{s \leq t \leq T}$ defined as 
$$ B_t = x + W_t - W_s - \frac{t-s}{T-s}(x + W_T -W_s) ,$$
a Brownian Bridge between $s, T$ and at $s$ it is $x$ and at $T$ its value is $0$. One way to see this formula is that the term $x + W_t - W_s$ is the Brownian Motion while we have to reduce the term at the endpoint $T$. Some simple calculus shows that it is equal to 
$$B_t = \frac{T-t}{T-s}(x + W_t - W_s) - \frac{t-s}{T-s}(W_T - W_t).$$

A Markov property is very simple but requires calculus: we would like to show that for $s < r < t < T$ we have
$$B_t = B_r + W_t - W_r - \frac{t-r}{T-r}(B_r + W_T -W_r). \quad (\star)$$ 
Now we prove it. An intermediate step tells us 
$$ - \frac{t-r}{T-r}(B_r + W_T -W_r) = - \frac{t-r}{T-s}(x + W_T - W_s)$$ and we put it into the formula that 
$$RHS (\star) = x + W_r - W_s - \frac{r-s}{T-s}(x + W_T -W_s) + W_t - W_r - \frac{t-r}{T-s}(x + W_T - W_s)\\ = x + W_t - W_s - \frac{r-s}{T-s}(x + W_T -W_s) - \frac{t-r}{T-s}(x + W_T - W_s) \\ = x + W_t - W_s - \frac{t-s}{T-s}(x + W_T -W_s).$$ 
This proves the Markov property. Then the strong Markov property is just an approximation and the regularity of the trajectory.