A sweet probabilistic proof that zeta(2)=…

Here is a delightful probabilistic proof that {\zeta(2)=\frac{\pi^2}{6}} extracted from a recent article by Greg Markowsky. This starts with pretty lemma (which can originally be found in this article) on the exit time of a {2} dimensional Brownian motion {B_t}. Suppose that {f:\mathbb{C} \rightarrow \mathbb{C}} is an analytic function on the neighbourhood of the unit disk. This function maps the unit disk {\mathbb{D}} to {f(\mathbb{D})} with boundary {\partial f(\mathbb{D}) = f(\partial \mathbb{D})} where {\partial \mathbb{D}= \big\{e^{i\theta} : 0 \leq \theta \leq 2 \pi \big\}}. A two dimensional brownian motion started at {f(0)} takes on average

\displaystyle \mathbb{E}[ \, \tau \, ] \; = \; \sum_{k \geq 1} |a_k|^2

to exit the domain {f(\mathbb{D})} where {f(z) = \sum_{k \geq 0} a_k z^k} and {\tau = \inf \big\{ \, t>0: B_t \in \partial f(\mathbb{D}) \, \big\}} is the hitting time of the boundary {\partial f(\mathbb{D})}. Indeed, since the situation is invariant by translation one can always suppose that {f(0)=a_0=0}. The proof of this result is a simple martingale argument. Indeed, since {M_t = \|B_t\|^2 - 2t} is a martingale, the optional stopping theorem shows that

\displaystyle \mathbb{E}[ \, \|B_{\tau}\|^2 \, ] \; = \; 2 \mathbb{E}[\, \tau \, ].

Then, since {f(\cdot)} is an analytic function, Ito’s formula shows that the trajectories of {B_t} are the same as the trajectories of {f(W_t)}, up to a time change, where {W_t} is another {2} dimensional Brownian motion started at {z=0}. Consequently, {B_{\tau} = f(Z_{\rho})} where {\rho} is the hitting time {\rho = \inf \big\{ \, t>0: W_t \in \partial \mathbb{D} \, \big\}}. This shows that

\displaystyle \mathbb{E}[ \, \|B_{\tau}\|^2 \, ] = \mathbb{E}[ \, \|f(W_{\rho})\|^2 \, ] = \frac{1}{2\pi} \int_{\theta=0}^{2\pi} |f(e^{i\theta})|^2 \, d\theta.

Since Parseval‘s theorem (or a direct calculation) shows that the last quantity also equals {\sum_{k \geq 0} |a_k|^2}, the conclusion follows.

 

To find a nice identity, it thus suffices to find an easy domain where one can compute explicitly the brownian exit time {\mathbb{E}[ \, \tau \, ]}. The first thing that comes to mind is a strip {S_a = \big\{ x+iy \, : \, -a < y < a\big\}} since it is classical that a Brownian motion started at {z=0} takes on average {T=a^2} to exit the strip {S_a}. Since {f(z) = \log\big( \frac{1-z}{1+z} \big) = -2(z+z^3/3+z^5/5 + z^7/7 + \ldots)} maps the unit disk {\mathbb{D}} to the strip {S_{\frac{\pi}{2}}} it follows that

\displaystyle \frac{\pi^2}{4} = 2(1+3^{-2} + 5^{-2} + 7^{-2} + \ldots),

which is indeed equivalent to the celebrated identity {\zeta(2) = \frac{\pi^2}{6}}.

Advertisements

Conditioned Brownian motion, 1st Laplacian eigenfunction, etc…

Brownian Motion

Brownian Motion

A few days ago, a student asked the following question:

Consider a bounded domain {D \subset \mathbb{R}^d} and a Brownian motion conditioned to stay inside {D}. How does it look like ? What is the invariant distribution ?

This is very simple, and surprisingly interesting. This involves the first eigenfunction of the Laplacian on {D}. To make everything simple, and because this does not change anything, it suffices to study the situation where {D=[0,1] \subset \mathbb{R}}. In other words, what does a real Brownian motion conditioned to stay inside the segment {[0,1]} look like.

Discretisation

One could directly do the computations in a continuous setting but, as it is often the case, it is simpler to consider the usual random walk discretisation of a Brownian motion. For this purpose, consider a time discretisation {\delta>0} and a standard random walk {X_k} with increment {\pm \sqrt{\delta}}: with probability {\frac12} the random walk goes up {+\sqrt{\delta}} and with probability {\frac12} it goes down {-\sqrt{\delta}}. For clarity, assume that there exists an integer {m_{\delta} \in \mathbb{N}} such that {m_{\delta} \sqrt{\delta} = 1}. Suppose as well that it is conditioned on the event {X_k \in [0,1]} for {k=1, \ldots, N}. Later, we will consider the limiting case {N \rightarrow \infty} and {\delta \rightarrow 0}, in this order, to recover the Brownian motion case.

Conditioning

First, let us compute the probability transitions of the random walk {\{X_k\}_{k=0}^N} conditioned on the event {X_k \in [0,1]} for {k=1, \ldots, N}. For clarity, let us denote the conditioned random walk by {\hat{X} = \big\{ \hat{X}_k \big\}_{k \geq 0}}. This is a Doob h-transform, and the resulting process is a non-homogenous Markov chain. In this simple case the computations are straightforward. The conditioned Markov chain has transition probabilities given by {\hat{P}(k,x,y) = \mathbb{P}[\hat{X}_{k+1}=y \,|\hat{X}_k=x]} with {x,y \in D_{\delta}} where {D_{\delta} = \sqrt{\delta} \mathbb{N} \; \cap \; [0,1]}. One can compute the probability that the conditioned Markov chain follows a given trajectory {(x_0, \ldots, x_N)} of {D_{\delta}},

\displaystyle  \begin{array}{rcl}  \mathbb{P}[\hat{X}_0 = x_0, \ldots, \hat{X}_N=x_N] = \frac{1}{Z(N,\delta)} P(x_0,x_1) \times \ldots \times P(x_{N-1}, x_N) \end{array}

where {P(x,y)} is the transition kernel of the unconditioned Markov chain and {Z(N,\delta) = \mathbb{P}[X_k \in D_{\delta} \; \text{for} \; k=0, \ldots, N]} is a normalisation constant. The Doob h-transform simply consists in noticing that this also reads

\displaystyle  \begin{array}{rcl}  \mathbb{P}[\hat{X}_0 = x_0, \ldots, \hat{X}_N=x_N] = \hat{P}(0,x_0,x_1) \times \ldots \times \hat{P}(N-1,x_{N-1}, x_N) \end{array}

where the conditioned Markov kernel is {\hat{P}(k,x_k,x_{k+1}) = \frac{P(x,y) \, h(k+1,y)}{h(k,x)}} and the function {h(\cdot, \cdot)} is defined by

\displaystyle  \begin{array}{rcl}  h(k,x) = \mathbb{P}[X_{j} \in D_{\delta} \; \text{for} \; k+1 \leq j \leq N \, |X_k=x]. \end{array}

Of course we have {h(N,x)=1} for all {x \in D_{\delta}}. The quantity {h(x,k)} is the probability that a random walk starting at {x \in D_{\delta}} at time {k} remains inside {D_{\delta}} for time {k,k+1, \ldots, N}. Consequently, in order to find the transition probabilities of the conditioned kernel, it suffices to compute the quantities {h(k, x)} for all {x \in D_{\delta}}. Since we are interested in the limiting case {N \rightarrow \infty}, it actually suffices to consider the case {k=0}. It can be computed recursively since {h(k,x) = \frac12 \, h(k+1, x+\sqrt{\delta}) + \frac12 \, h(k+1, x-\sqrt{\delta})} with the appropriate boundary conditions. In other words, adopting the obvious matrix notations, the vector {h(k,\cdot)} satisfies

\displaystyle  \begin{array}{rcl}  h(k,\cdot) = A h(k+1, \cdot) \end{array}

where {\big( A_{i,j} \big)_{0 \leq i,j \leq m_{\delta}}} is the usual tridiagonal matrix given by {A_{i,j} = 1} if, and only if, {|i-j|=1} and {A_{i,j} = 0} otherwise. It is related to the discrete Laplacian operator. Indeed, because all the eigenvalues of {A} are real with modulus strictly inferior to {1}, it follows that {h(0,\cdot) = A^{N} \textbf{1} \approx \lambda^N \, \varphi_{\delta}(\cdot)} where {\lambda_{\delta}} is the highest eigenvalue of {A} and {\varphi_{\delta}(\cdot)} the associated eigenfunction. The eigenvalues of {A} are well-known, and as {\delta \rightarrow 0}, the highest eigenvalue converges to {1} and the associated eigenfunction converges to the first eigenfunction of the Laplacian on the domain {D=[0,1]} with Dirichlet boundary. In our case it is {\varphi(u) = \sin(\pi \, u)} and

\displaystyle  \begin{array}{rcl}  \lim_{N \rightarrow \infty} \frac{ h(1,x \pm \sqrt{\delta})}{h(0,x)} = \lambda_{\delta}^{-1} \frac{ \varphi_{\delta}(x \pm \sqrt{\delta})}{\varphi_{\delta}(x)}. \end{array}

In other words, the random walk with increments {\pm \sqrt{\delta}} conditioned to stay inside {D_{\delta}} has probability transitions given by

\displaystyle  \begin{array}{rcl}  \hat{P}(x, x \pm \sqrt{\delta}) = \frac{{\lambda_{\delta}}^{-1}}{2} \frac{ \varphi_{\delta}(x \pm \sqrt{\delta})}{\varphi_{\delta}(x)}. \end{array}

Next section investigates the limiting case {\delta \rightarrow 0}.

Conclusion

We have computed the dynamics of the conditioned random walk with space-increments {\sqrt{\delta}}. To obtain the dynamics of the conditioned Brownian motion it suffices to consider the limiting case {\delta \rightarrow 0}. The drift of the resulting diffusion is given by

\displaystyle  \begin{array}{rcl}  \text{(drift)} &=& \lim_{\delta \rightarrow 0} \frac{1}{\delta} \Big\{ \hat{P}(0,x,x+\sqrt{\delta})\sqrt{\delta} - \hat{P}(0,x,x-\sqrt{\delta})\sqrt{\delta} \Big\} \\ &=& \lim_{\delta \rightarrow 0} \frac{{\lambda_{\delta}}^{-1}}{2 \sqrt{\delta}} \Big\{ \frac{ \varphi_{\delta}(x + \sqrt{\delta})}{\varphi_{\delta}(x)} - \frac{ \varphi_{\delta}(x - \sqrt{\delta})}{\varphi_{\delta}(x)} \Big\} = \frac{1}{2} \frac{\varphi'(x)}{\varphi(x)}. \end{array}

The same computation gives the volatility of the resulting diffusion. It is given by

\displaystyle  \begin{array}{rcl}  \text{(volatility)} &=& \lim_{\delta \rightarrow 0} \frac{1}{\delta} \Big\{ \hat{P}(0,x,x+\sqrt{\delta})\delta + \hat{P}(0,x,x-\sqrt{\delta})\delta \Big\} \\ &=& \lim_{\delta \rightarrow 0} \frac{{\lambda_{\delta}}^{-1}}{2} \Big\{ \frac{ \varphi_{\delta}(x + \sqrt{\delta})}{\varphi_{\delta}(x)} + \frac{ \varphi_{\delta}(x - \sqrt{\delta})}{\varphi_{\delta}(x)} \Big\} = 1. \end{array}

As the consequence, this shows that a Brownian motion conditioned to stay inside {D=[0,1]} follows the stochastic differential equation {dX = \frac{1}{2} \frac{\varphi'(X)}{\varphi(X)} \, dt + dW_t} where {\varphi(\cdot)} is the first eigenfunction of the Laplacian on {D} with Dirichlet boundary conditions. More generaly, the same argument would show that a Brownian motion in {\mathbb{R}^d} conditioned to stay inside a nice bounded domain {D \subset \mathbb{R}^d} evolves according to the stochastic differential equation

\displaystyle  \begin{array}{rcl}  dX = \frac{1}{2} \frac{\nabla \varphi(X)}{\varphi(X)} \, dt + dW_t \end{array}

where {\varphi:D \rightarrow \mathbb{R}} is the first eigenfunction of the Laplacian on {D}. This is a Langevin diffusion and one can immediately see that the invariant distribution of this diffusion is given by

\displaystyle  \begin{array}{rcl}  \pi_{\infty}(x) \, \propto \varphi(x). \end{array}

For example, the following plot depicts the first eigenfunction of the Laplacian on the domain {D=[0,1]^2 \subset \mathbb{R}^2}.

First Eigenfunction

Doob H-transforms

I read today about Doob h-transforms in the Rogers-Williams … It is done quite quickly in the book so that I decided to practice on some simple examples to see how this works.

So we have a Markov process {X_t} living in the state space {S}, and we want to see how this process looks like if we condition on the event {X_T \in A} where {A} is a subset of the state space. To fix the notations we define {p(t,t+s,x,y) = P(X_{t+s}=y|X_t=x)} and {h(t,x)=P(X_T \in A \, | X_t=x)}. The conditioned semi-group {\hat{p}(t,t+s,x,y)=P(X_{t+s}=y|X_t=x, X_T \in A)} is quite easily computed from {p} and {h}. Indeed, this also equals

\displaystyle \hat{p}(t,t+s,x,y) = \frac{P(X_{t+s}=y; X_T \in A\;|X_t=x)}{P(X_T \in A \,|X_t=x)} = p(t,t+s,x,y) \frac{h(t+s,y)}{h(t,x)}.

Notice also that {\hat{p}(t,t+s,x,y) = p(t,t+s,x,y) \frac{h(t+s,y)}{h(t,x)}} is indeed a Markov kernel in the sense that {\int_{y} \hat{p}(t,t+s,x,y) \, dy = 1}: the only property needed for that is

\displaystyle  h(t,x) = \int_{y} p(t,t+s,x,y)h(t+s,y)\,dy = E\left[ h(t+s,X_{t+s}) \, |X_t=x\right].

In fact, we could take any function {h} that satisfies this equality and define a new Markovian kernel {\hat{p}} and study the associated Markov process. That’s what people usually do by the way.

Remark 1 we almost never know explicitly the quantity {h(t,x)}, except in some extremely simple cases !

Before trying these ideas on some simple examples, let us see what this says on the generator of the process:

  1. continuous time Markov chains, finite state space:let us suppose that the intensity matrix is {Q} and that we want to know the dynamic on {[0,T]} of this Markov chain conditioned on the event {X_T=z}. Indeed {p(t,t+s,i,j) = [\exp(sQ)]_{i,j}} so that {\hat{p}(t,t+s,i,j) = [\exp(sQ)]_{i,j} \frac{p(t+s,T,j,z)}{p(t,T,i,z)}} so that in the limit we see that at time {t}, the intensity of the jump from {i} to {j} of the conditioned Markov chain is
    \displaystyle  Q(i,j) \frac{p(t+s,T,j,z)}{p(t,T,i,z)}.

    Notice how this behaves while {t \rightarrow T}: if at {t=T-\epsilon} the Markov chain is in state {i \neq z} then the intensity of jump from {i} to {z} is equivalent to {\approx \frac{1}{\epsilon}}.

  2. diffusion processes:this time consider a {1}-dimensional diffusion {dX_t = \mu(X_t) \, dt + \sigma(X_t) \, dW_t} on {[0,T]} conditioned on the event {X_T \in A} and define as before {h(t,x)=P(X_T \in A \,|X_t=x)}. The generator of the (non-homogeneous) conditioned diffusion is defined at time {t} by
    \displaystyle  \begin{array}{rcl}  \mathcal{G}^{(t)} f(x) &=& \lim_{s \rightarrow 0} \frac{1}{s} \Big( E\left[f(X_{t+s}) \,| X_t=x, X_T \in A\right]-f(x) \Big)\\ &=& \lim_{s \rightarrow 0} \frac{E\left[ f(X_{t+s}) h(t+s, X_{t+s}) \,| X_t=x\right]-f(x) }{s\,h(t,x)} \end{array}

    so that if {\mathcal{L} = \mu \partial_x + \frac{1}{2} \sigma^2 \partial^2_{xx}} is the generator of the original diffusion we get

    \displaystyle  \mathcal{G}^{(t)} f = \frac{1}{h} \Big(\partial_t + \mathcal{L})(hf).

    Because {(\partial_t + \mathcal{L})h=0}, this also reads

    \displaystyle  \mathcal{G}^{(t)} f = \mathcal{L}f + \sigma^2 \frac{\partial_x h}{h} \partial_x f.

    This means that the conditioned diffusion {Z} follows the SDE:

    \displaystyle  dZ_t = \Big( \mu(Z_T) + \sigma(Z_t)^2 \frac{\partial_x h(t,Z_t)}{h(t,Z_t)} \Big) \, dt+ \sigma(Z_t) dW_t.

    The volatility function remains the same while an additional drift shows up.

We will try these ideas on some examples where the probability densities are extremely simple. Notice that in the case of diffusions, if we take {A=\{x^+\}}, the function {(t,x) \mapsto P(X_T = x^+ \, |X_t=x)} is identically equal to {0} (except degenerate cases): to condition on the event {X_T=x} we need instead to take {h(t,x)} to be the transition probability {p(t,T,x,x^+)}. This follows from the approximation {P(X_T \in (x^+,x^+ + dx) \,|X_t=x ) \approx p(t,T,x,x^+) \, dx + o(dx)}. Let’s do it:

  • Brownian Bridge on {[0,T]}:in this case {p(t,T,x,0) \propto e^{-\frac{|x-y|^2}{2(T-t)}}} so that the additional drift reads {\frac{-x}{T-t}}: a Brownian bridge follows the SDE
    \displaystyle  dX_t = -\frac{X_t}{T-t} \, dt + dW_T.

    This might not be the best way to simulate a Brownian bridge though!

  • Poisson Bridge on {[0,T]}:we condition a Poisson process of rate {\lambda} on the event {X_T=N}. The intensity matrix is simply {Q(k,k+1)=\lambda=-Q(k,k)} and {0} everywhere else while the transition probabilities are given by {p(t,T,k,N) = e^{-\lambda (T-t)} \frac{(\lambda (T-t) )^{N-k}}{(N-k)!}}. This is why at time {t}, the intensity from {k} to {k+1} is given by
    \displaystyle  \lambda(t,k,k+1) = \frac{N-k}{T-t}.

    Again, that might not be the most efficient way to simulate a Poisson Bridge ! Notice how the intensity {\lambda} has disappeared …

  • Ornstein-Uhlenbeck Bridge:Let’s consider the usual OU process given by the dynamic {dX_t = -X_t + \sqrt{2}dW_t}: the invariant probability is the usual centred Gaussian distribution. Say that we want to know how does such an OU process behave if we condition on the event {X_T = z}. Because {p(t,T,x,z) \propto \exp(-\frac{|z-e^{-(T-t)}x|^2}{2(1-e^{-2(T-t)})^2})} we find that the conditioned O-U process follows the SDE
    \displaystyle dX_t = \Big(\frac{z-e^{-(T-t)x}}{e^{T-t}(1-e^{-2(T-t)})^2} - X_t \Big)\, dt+ \sqrt{2} \, dW_t.

    If we Taylor expand the additonal drift, it can be seen that this term behaves exactly as in the case of the Brownian bridge. Below is a plot of an O-U process conditioned on the event {X_{10} = 10}, starting from {X_0=0}.

    conditioned O-U process

    conditioned O-U process

Brownian particles on a circle

Today, James norris gave a talk related to Diffusion-limited aggregation processes and mentioned, in passing, the following amusing fact: put {N} equidistant Brownian particles {W_1, \ldots, W_N} on the circle with unit circumference and let them evolved. When two of them collide they get stuck to each other and continue together afterwards: after a certain amount of time {T_N}, only one particle remains. Perhaps surprisingly, this is extremely easy to obtain the first few properties of {T_N}. For example, {\lim_N \mathop{\mathbb E}\left[ T_N \right] = \frac{1}{6}}.

To see that, define {D_k} the distance between {W_k} and {W_{k+1}} (modulo {N}) so that {D_k = \frac{1}{N}} for {k=1,2 \ldots, N}. Notice then (It\^o’s formula) that

\displaystyle  M_t = e^{\lambda t} \sum_{k=1^N} \sin(\sqrt{\lambda} D_k)

is a (local) martingale that starts from {M_0 = N \sin(\frac{\sqrt{\lambda}}{N})}. Also, at time {T_N}, exactly {N-1} of the distances {D_1, D_2, \ldots, D_N} are equal to {0} while one of them is equal to {1}: this is why {M_{T_N} = e^{\lambda T_N} \sin(\sqrt{\lambda})}. The end is clear: apply the optional sampling theorem (to be rigorous, take {\lambda} not too big, or do some kind of truncations to be sure that the optional sampling theoem applies) to conclude that

\displaystyle  \mathop{\mathbb E}\left[ e^{\lambda T_N} \right] = \frac{N \sin(\frac{\sqrt{\lambda}}{N})}{\sin(\sqrt{\lambda})}.

This gives for example {\mathop{\mathbb E}\left[ T_N \right] = \frac{1}{6}(1-\frac{1}{N^2})}. I just find it cute!

So what if we do that on a segment ?