2020年11月9日星期一

A small theorem of Boolean approximation

I would like to record one theorem learned in the Alibaba competition this year. In fact, this problem has also appeared in the maths competition at Peking University for high school students. Let $(a_i)_{1 \leq i \leq n}$ be positive sequence such that $\sum_{i=1}^n a_i = a$ and $\sum_{i=1}^n (a_i)^2 = 1$, then we can always find a configuration $\epsilon_i \in \{\pm 1\}$ such that $$\vert \sum_{i=1}^n \epsilon_i a_i \vert  \leq \frac{1}{a}.$$

Let us see what this theorem tells us. From AG inequality, we know that if every number is equal, $a = \sqrt{n}$ and every $a_i = \frac{1}{\sqrt{n}}$. Then by a choice of the best Boolean approximation $\sum_{i=1}^n \epsilon_i a_i $, one can get a number very close to the $0$ with an error $\frac{1}{\sqrt{n}}$ --- that is the error of one term.

This makes us think of the concentration inequality in the probability --- like Markov inequality, Hoeffding inequality etc. That is the case when I did Alibaba competition, but I have to say this is a very dangerous trap in this question. In fact, even after the competition, I continue trying this idea many times, but it seems no easy way to figure it out. In fact, let us recall what concentration inequality teaches us: yes, if we put $\epsilon_i$ centered variable, the measure should be concentrated and $$\mathbb{P}[\vert \sum_{i=1}^n \epsilon_i a_i \vert  \geq \frac{1}{a} ] \leq 2\exp(-2/a^2)$$.
Good, you will see an explosion and this does not give useful information. Indeed, the concentration inequality always told us the measure should be in the $\sigma$ region, but it tells nothing how good the measure is concentrated in the $\sigma$ region. In this question, a random choice is clearly not good because the $\sigma$ for $\sum_{i=1}^n \epsilon_i a_i $ is $1$.

One has to keep in mind that the probabilistic method is good and cool, but not the only way and sometimes not the best way.

A simple solution is just by induction and one step exploration, or someone calls it the greedy method. This problem is equivalent to prove $$\left(\min_{\epsilon_i \in \{\pm\}} \vert \sum_{i=1}^n \epsilon_i a_i \vert  \right) (\sum_{i=1}^n a_i) \leq \sum_{i=1}^n (a_i)^2.$$ We do at first the optimization for $n$ variable and then choose the sign for the last one. We can also manipulate the choice of the last variable. For example, one can let $a_{n+1}$ always be the smallest one, thus its influence is always smaller than $\frac{1}{a}$. Once we get the correct direction, the theorem is not difficult.

2020年7月30日星期四

Variation argument is everywhere

Today I am asked a high school level question: given $1 \geq p_1 \geq p_2 \cdots p_n \geq 0$ and find a subset of these number such that to maximize the quantity
$$F(S) = \sum_{i=1}^{|S|}\frac{p_{\alpha_i}}{1 - p_{\alpha_i}} \left( \prod_{j=1}^{|S|}(1-p_{\alpha_j})\right).$$
If one would like to use the search naively, the complexity will be of course large as $n!$. I want to say a simple variational argument is although simple but useful. But supposing adding one more element, deleting one element, or replacing one element, one can see quickly the description for this optimal subset is $\tau$ that
$$S := \{p_i\}_{1 \leq i \leq \tau}, \tau := \min \left\{n \in \mathbb{N}: \sum_{i=1}^\tau \frac{p_i}{1-p_i} \geq 1\right\}.$$
Thus the complexity is reduced to $O(n)$. So let us always think about the variational argument.

2020年7月5日星期日

Optimization from random constrains

This week I heard a very nice talk from my classmate Huajie Qian. The question can be stated as doing $\min_{x}c \cdot x$ under the constrains $h(x, \zeta) \leq 0$ where $\zeta$ is a random variable. A very naive method is to say: we have data $\{\zeta_i\}_{1 \leq i \leq n}$, and then we do the problem so that these $n$ data is satisfied. Finally it gives the nice estimate that 
$$\mathbb P_{\{\zeta_i\}_{1 \leq i \leq n}}\left[\mathbb P_{\zeta}(h(x,\zeta) \leq 0) \leq 1- \alpha\right] \leq {n \choose d}(1-\alpha)^{n-d},$$
which is a very nice result.

The key to prove the result above depends on a very nice lemma: in fact, one can find $d$ constrains among the $n$ to solve the problem. That is to say: only $d$ constrains concern. 

To prove it, we need two theorems: Radon's theorem and Helly's theorem. The first one says in the space $\mathbb R^d$, for $(d+2)$ points, we can always find a partition $I_1, I_2$ so that the convex hull of $I_1, I_2$ has non-empty intersection. The second one says for convex sets $\{C_i\}_{1 \leq i \leq m}$, if the intersection of any $(d+1)$ has non-empty intersection, then $\cap_{i \leq m} C_i$ is non-empty.

Using the two theorems, we suppose that any $d$ constrains always give a strict better minimiser, then we do the projection on the hyperplane given by the direction $c$, and then apply the Helly's theorem on it to prove the contradiction.

2020年7月4日星期六

Maximal correlation

This is a question about the correlation. Let $X,Y$ be two Gaussion random variable $\mathcal{N}(0,1)$ with correlation $\beta$, then prove that the best constant of Cauchy inequality is 
$$Cov(f(X), g(Y)) \leq \vert \beta \vert \sqrt{Var(f(X)) Var(g(Y))}$$.

In fact, one can define the maximal correlation of random variable by the best constant above and of course it should be bigger than $\beta$. Let us remark how to prove the inequality above quickly. We can use the expansion by Hermit polynomial that we have 
$$\mathbb{E}[H_n(X) H_m(Y)] = \delta_{n,m} \left(\mathbb{E}\frac{1}{n!}[XY]\right)^n.$$
Then a centered $L^2$ functions have projection on $H_0$ zero. Then we have 
$$\mathbb{E}[f(X)g(Y)] = \sum_{n=1}^{\infty}\langle f, H_n \rangle \langle g, H_n \rangle \frac{1}{n!} \beta^n \\ \leq \vert \beta \vert \sqrt{\sum_{n=1}^{\infty}\langle f, H_n \rangle^2 \frac{1}{n!} } \sqrt{\sum_{n=1}^{\infty}\langle g, H_n \rangle^2 \frac{1}{n!} } \\ \leq \beta \vert \sqrt{Var(f(X)) Var(g(Y))}.$$
This concludes the proof.

2020年6月26日星期五

Strong Markov property for Brownian Bridge

The strong Markov property for Brownian motion is well known and it is also naturally true for the Brownian Bridge. In fact, since Brownian Bridge ended at $(T, 0)$ is thought as a Brownian Motion conditionned the end point, or thought as a perturbation for a linear interpolation, it is natural when we restart from another mid-point.

To prove it rigorously, for example, the Markov property for the Brownian Bridge, we have to do some calculus. Let $(W_t)_{t \geq 0}$ be the standard Brownian motion issued from $0$, and we have $(B_t)_{s \leq t \leq T}$ defined as 
$$ B_t = x + W_t - W_s - \frac{t-s}{T-s}(x + W_T -W_s) ,$$
a Brownian Bridge between $s, T$ and at $s$ it is $x$ and at $T$ its value is $0$. One way to see this formula is that the term $x + W_t - W_s$ is the Brownian Motion while we have to reduce the term at the endpoint $T$. Some simple calculus shows that it is equal to 
$$B_t = \frac{T-t}{T-s}(x + W_t - W_s) - \frac{t-s}{T-s}(W_T - W_t).$$

A Markov property is very simple but requires calculus: we would like to show that for $s < r < t < T$ we have
$$B_t = B_r + W_t - W_r - \frac{t-r}{T-r}(B_r + W_T -W_r). \quad (\star)$$ 
Now we prove it. An intermediate step tells us 
$$ - \frac{t-r}{T-r}(B_r + W_T -W_r) = - \frac{t-r}{T-s}(x + W_T - W_s)$$ and we put it into the formula that 
$$RHS (\star) = x + W_r - W_s - \frac{r-s}{T-s}(x + W_T -W_s) + W_t - W_r - \frac{t-r}{T-s}(x + W_T - W_s)\\ = x + W_t - W_s - \frac{r-s}{T-s}(x + W_T -W_s) - \frac{t-r}{T-s}(x + W_T - W_s) \\ = x + W_t - W_s - \frac{t-s}{T-s}(x + W_T -W_s).$$ 
This proves the Markov property. Then the strong Markov property is just an approximation and the regularity of the trajectory.