Processing math: 23%

2020年11月9日星期一

A small theorem of Boolean approximation

I would like to record one theorem learned in the Alibaba competition this year. In fact, this problem has also appeared in the maths competition at Peking University for high school students. Let (ai)1in be positive sequence such that ni=1ai=a and ni=1(ai)2=1, then we can always find a configuration ϵi{±1} such that |ni=1ϵiai|1a.

Let us see what this theorem tells us. From AG inequality, we know that if every number is equal, a=n and every ai=1n. Then by a choice of the best Boolean approximation ni=1ϵiai, one can get a number very close to the 0 with an error 1n --- that is the error of one term.

This makes us think of the concentration inequality in the probability --- like Markov inequality, Hoeffding inequality etc. That is the case when I did Alibaba competition, but I have to say this is a very dangerous trap in this question. In fact, even after the competition, I continue trying this idea many times, but it seems no easy way to figure it out. In fact, let us recall what concentration inequality teaches us: yes, if we put ϵi centered variable, the measure should be concentrated and P[|ni=1ϵiai|1a]2exp(2/a2).
Good, you will see an explosion and this does not give useful information. Indeed, the concentration inequality always told us the measure should be in the σ region, but it tells nothing how good the measure is concentrated in the σ region. In this question, a random choice is clearly not good because the σ for ni=1ϵiai is 1.

One has to keep in mind that the probabilistic method is good and cool, but not the only way and sometimes not the best way.

A simple solution is just by induction and one step exploration, or someone calls it the greedy method. This problem is equivalent to prove (min We do at first the optimization for n variable and then choose the sign for the last one. We can also manipulate the choice of the last variable. For example, one can let a_{n+1} always be the smallest one, thus its influence is always smaller than \frac{1}{a}. Once we get the correct direction, the theorem is not difficult.

2020年7月30日星期四

Variation argument is everywhere

Today I am asked a high school level question: given 1 \geq p_1 \geq p_2 \cdots p_n \geq 0 and find a subset of these number such that to maximize the quantity
F(S) = \sum_{i=1}^{|S|}\frac{p_{\alpha_i}}{1 - p_{\alpha_i}} \left( \prod_{j=1}^{|S|}(1-p_{\alpha_j})\right).
If one would like to use the search naively, the complexity will be of course large as n!. I want to say a simple variational argument is although simple but useful. But supposing adding one more element, deleting one element, or replacing one element, one can see quickly the description for this optimal subset is \tau that
S := \{p_i\}_{1 \leq i \leq \tau}, \tau := \min \left\{n \in \mathbb{N}: \sum_{i=1}^\tau \frac{p_i}{1-p_i} \geq 1\right\}.
Thus the complexity is reduced to O(n). So let us always think about the variational argument.

2020年7月5日星期日

Optimization from random constrains

This week I heard a very nice talk from my classmate Huajie Qian. The question can be stated as doing \min_{x}c \cdot x under the constrains h(x, \zeta) \leq 0 where \zeta is a random variable. A very naive method is to say: we have data \{\zeta_i\}_{1 \leq i \leq n}, and then we do the problem so that these n data is satisfied. Finally it gives the nice estimate that 
\mathbb P_{\{\zeta_i\}_{1 \leq i \leq n}}\left[\mathbb P_{\zeta}(h(x,\zeta) \leq 0) \leq 1- \alpha\right] \leq {n \choose d}(1-\alpha)^{n-d},
which is a very nice result.

The key to prove the result above depends on a very nice lemma: in fact, one can find d constrains among the n to solve the problem. That is to say: only d constrains concern. 

To prove it, we need two theorems: Radon's theorem and Helly's theorem. The first one says in the space \mathbb R^d, for (d+2) points, we can always find a partition I_1, I_2 so that the convex hull of I_1, I_2 has non-empty intersection. The second one says for convex sets \{C_i\}_{1 \leq i \leq m}, if the intersection of any (d+1) has non-empty intersection, then \cap_{i \leq m} C_i is non-empty.

Using the two theorems, we suppose that any d constrains always give a strict better minimiser, then we do the projection on the hyperplane given by the direction c, and then apply the Helly's theorem on it to prove the contradiction.

2020年7月4日星期六

Maximal correlation

This is a question about the correlation. Let X,Y be two Gaussion random variable \mathcal{N}(0,1) with correlation \beta, then prove that the best constant of Cauchy inequality is 
Cov(f(X), g(Y)) \leq \vert \beta \vert \sqrt{Var(f(X)) Var(g(Y))}.

In fact, one can define the maximal correlation of random variable by the best constant above and of course it should be bigger than \beta. Let us remark how to prove the inequality above quickly. We can use the expansion by Hermit polynomial that we have 
\mathbb{E}[H_n(X) H_m(Y)] = \delta_{n,m} \left(\mathbb{E}\frac{1}{n!}[XY]\right)^n.
Then a centered L^2 functions have projection on H_0 zero. Then we have 
\mathbb{E}[f(X)g(Y)] = \sum_{n=1}^{\infty}\langle f, H_n \rangle \langle g, H_n \rangle \frac{1}{n!} \beta^n \\ \leq \vert \beta \vert \sqrt{\sum_{n=1}^{\infty}\langle f, H_n \rangle^2 \frac{1}{n!} } \sqrt{\sum_{n=1}^{\infty}\langle g, H_n \rangle^2 \frac{1}{n!} } \\ \leq \beta \vert \sqrt{Var(f(X)) Var(g(Y))}.
This concludes the proof.

2020年6月26日星期五

Strong Markov property for Brownian Bridge

The strong Markov property for Brownian motion is well known and it is also naturally true for the Brownian Bridge. In fact, since Brownian Bridge ended at (T, 0) is thought as a Brownian Motion conditionned the end point, or thought as a perturbation for a linear interpolation, it is natural when we restart from another mid-point.

To prove it rigorously, for example, the Markov property for the Brownian Bridge, we have to do some calculus. Let (W_t)_{t \geq 0} be the standard Brownian motion issued from 0, and we have (B_t)_{s \leq t \leq T} defined as 
B_t = x + W_t - W_s - \frac{t-s}{T-s}(x + W_T -W_s) ,
a Brownian Bridge between s, T and at s it is x and at T its value is 0. One way to see this formula is that the term x + W_t - W_s is the Brownian Motion while we have to reduce the term at the endpoint T. Some simple calculus shows that it is equal to 
B_t = \frac{T-t}{T-s}(x + W_t - W_s) - \frac{t-s}{T-s}(W_T - W_t).

A Markov property is very simple but requires calculus: we would like to show that for s < r < t < T we have
B_t = B_r + W_t - W_r - \frac{t-r}{T-r}(B_r + W_T -W_r). \quad (\star) 
Now we prove it. An intermediate step tells us 
 - \frac{t-r}{T-r}(B_r + W_T -W_r) = - \frac{t-r}{T-s}(x + W_T - W_s) and we put it into the formula that 
RHS (\star) = x + W_r - W_s - \frac{r-s}{T-s}(x + W_T -W_s) + W_t - W_r - \frac{t-r}{T-s}(x + W_T - W_s)\\ = x + W_t - W_s - \frac{r-s}{T-s}(x + W_T -W_s) - \frac{t-r}{T-s}(x + W_T - W_s) \\ = x + W_t - W_s - \frac{t-s}{T-s}(x + W_T -W_s). 
This proves the Markov property. Then the strong Markov property is just an approximation and the regularity of the trajectory.