Fluid Limits and Random Graphs

The other day Christina Goldschmidt gave a very nice talk on results obtained by James Norris and R. W. R. Darling: in their paper they use fluid limit techniques in order to study properties a Random graphs. This approach has long been used to analyse queuing systems for example.

1. Erdos Renyi Random Graph model

Remember that the Erdos Renyi random graph {\mathcal{G}(N,p)} is a graph with {N} vertices {\{1,2, \ldots, N\}} such that any two of them are connected with probability {p}, independently.

Erdos-Renyi random graph model

Erdos-Renyi random graph model

This model of random graphs has been extensively studied and this has long been known that {G(N,\frac{c}{N})} undergoes a phase transition at {c=1}: for {c<1}, the largest component {L^{(N,c)}} of {\mathcal{G}(N,\frac{c}{N})} has a size of order {\log(N)} while for {c>1} the largest connected component {L^{(N,c)}}, also called the giant component, has size of order {N}. In another blog entry, another consequence of this phase transition has been used.

One can even show that for {c>1} there exists a constant {K_c} such that for any {\epsilon > 0},

\displaystyle  \lim_{N \rightarrow \infty} \; \; \mathop{\mathbb P}\Big( |\frac{\textrm{Card}(L^{(N,c)})}{N} - K_c| > \epsilon \Big) = 0.

Simply put, the largest connected component of {\mathcal{G}(N,\frac{c}{N})} has size approximatively equal to {K_c \cdot N}. Maybe surprisingly, this is quite simple to compute the value of {K(c)} through fluid limit techniques.

2. Exploration of the graph

Let {G} be a configuration of {\mathcal{G}(N, \frac{c}{N})}. This configuration is explored through a classical depth-first algorithm. To keep track of what is doing, a process {Z_k: \{1, \ldots, N\} \rightarrow \mathbb{Z}} is introduced. This process captures essentially all the information needed to study the size of the connected component: it is then shown that {Z} behaves essentially deterministically when looked under the appropriate scale.

  1. initilisation: t=0: Color all the vertices in green, the current vertex is {v=1} and define {Z_1=0}.
  2. t=t+1: Color the current vertex in red (=visited): if a vertex is connected to {v} and is green, color it in grey ( this vertex has still to be visited, but its ‘parent’ has already been visited): these are the children of {v}.
  3. update {Z} the following way, {Z_t = Z_{t-1} + (\textrm{Nb of children of }v) - 1}, and
    • if {v} has children, choose one of them and make it the new current vertex. Go to {2}.
    • If {v} has no children then the new current vertex is its parent. Go to {2}.
    • if {v} has no children and no parent, the new current vertex is one of the remaining (if any) green vertex. Go to {2}
    • else: we are finished.

depth first exploration

A moment of though reveals that {Z_1=0} and {Z} reaches {-1} precisely when the connected component of the vertex {1} has all been explored. Then {Z} reaches {-2} precisely when the second connected component has all been explored, etc … This is why if {T_0=1} and {T_k = \inf \{t: Z_t = -k\}} then the connected components have size exactly {S_k = T_{k+1}-T_k} for {k=0,1,2, \ldots}. The largest component has size {\max_k S_k}.

Z process

3. Fluid Limit

It is quite easy to show why the rescaled process {z^N(t) = \frac{ Z([Nt])}{N}} should behave like a differential equation (ie: fluid limit). While exploring the first component, {Z_k} represents the number of gray vertices at time {k} so that {N-k-Z_k} represents the number of green vertices: this is why

\displaystyle  \mathop{\mathbb E}\Big[Z_{k+1}-Z_k \;|Z_k \Big] = (N-k-Z_k) \frac{c}{N} = c\Big(1-\frac{k}{N} - \frac{Z_k}{N} \Big) - 1

so that {\mathop{\mathbb E}\Big[ \frac{z^N(t+\Delta)-z^N(t)}{\Delta} | z^N(t) \Big] \approx c\Big(1-t-z(t) \Big) - 1} where {\Delta = \frac{1}{N}}. Also, the variance statisfies {\text{Var}\Big[ \frac{z^N(t+\Delta)-z^N(t)}{\sqrt{\Delta}} | z^N(t) \Big] \approx N^{-\frac{1}{2}} c\Big(1-t-z(t) \Big) \rightarrow 0} so that all the ingredients for a fluit limit are present: {z^N} converges to the differential equation

\displaystyle  \frac{d}{dt} z(t) = c\Big(1-t-z(t) \Big) - 1

whose solution is {z(t) = (1-t)-e^{-ct}}. This is why {K=K_c > 0} is implicitly defined as the solution of

\displaystyle  1-K_c = e^{-c K_c}.

As expected, {\lim_{c \rightarrow 1^+} K_c = 0} and {\lim_{c \rightarrow +\infty} K_c = 1}.

Fluid Limit

Is this random ? Card shuffling, coins and their friends …

A brilliant talk by Persi Diaconis !

Potts model and Monte Carlo Slow Down

A simple model of interacting particles

The mean field Potts model is extremely simple: there are {N} interacting particles {x_1, \ldots, x_N} and each one of them can be in {q} different states {1,2, \ldots, q}. Define the Hamiltonian

\displaystyle  H_N(x) = -\frac{1}{N} \sum_{i,j} \delta(x_i, x_j)

where {x=(x_1, \ldots, x_N)} and {\delta} is the Kronecker symbol. The normalization {\frac{1}{N}} ensures that the energy is an extensive quantity so that the mean energy per particle {h_N(x) = \frac{1}{N} H_N(x)} does no degenerate to {0} or {+\infty} for large values of {N}. The sign minus is here to favorize configurations that have a lot of particles in the same state. The Boltzman distribution at inverse temperature {\beta} on {\{1, \ldots, q\}^N} is given by

\displaystyle P_{N,\beta} = \frac{1}{Z_N(\beta)} e^{-\beta H_N(x)}

where {Z_N(\beta)} is a normalization constant. Notice that if we choose a configuration uniformly at random in {\{1, \ldots, q\}^N}, with overwhelming probability the ratio of particles in state {k} will be close to {\frac{1}{q}}. Also it is obvious that if we define

\displaystyle  L^{(N)}_k(x) = \frac{1}{N} \, \Big( \textrm{Number of particles in state }k \Big)

then {L=(L^{(N)}_1, \ldots, L^{(N)}_q)} will be close to {(\frac{1}{q}, \ldots, \frac{1}{q})} for a configuration taken uniformly at random. Stirling formula even says that the probability that {L} is close to {\nu = (\nu_1, \ldots, \nu_q)} is close to {e^{-N \, R(\nu)}} where

\displaystyle R(\nu) = \nu_1 \ln(q\nu_1) + \ldots + \nu_q \ln(q\nu_q).

Indeed {(\frac{1}{q}, \ldots, \frac{1}{q}) = \textrm{argmin} \, R(\nu)}. The situation is quite different under the Boltzman distribution since it favorizes the configurations that have a lot of particles in the same state: this is because the Hamiltonian {H_N(x)} is minimized for configurations that have all the particles in the same state. In short there is a competition between the entropy (there are a lot of configurations close to the ratio {(\frac{1}{q}, \ldots, \frac{1}{q})}) and the energy that favorizes the configurations where all the particles are in the same state.

With a little more work, one can show that there is a critical inverse temperature {\beta_c} such that:

  • for {\beta < \beta_c} the entropy wins the battle: the most probable configurations are close to the ratio {(\frac{1}{q}, \ldots, \frac{1}{q})}
  • for {\beta > \beta_c} the energy effect shows up: there are {q} most probable configurations that are the permutations of {(a_{\beta},b_{\beta},b_{\beta}, \ldots, b_{\beta})} where {a_{\beta}} and {b_{\beta}} are computable quantities.

The point is that above {\beta_c} the system has more than one stable equilibrium point. Maybe more important, if we compute the energy of these most probable states

\displaystyle  h(\beta) = \lim \frac{1}{N} H_N(\textrm{most probable state})

then this function has a discontinuity at {\beta=\beta_c}. I will try to show in the weeks to come how this behaviour can dramatically slow down usual Monte-Carlo approach to the study of these kind of models.

Hugo Touchette has a very nice review of statistical physics that I like a lot and a good survey of the Potts model. Also T. Tao has a very nice exposition of related models. The blog of Georg von Hippel is dedicated to similar models on lattices, which are far more complex that this mean field approximation presented here.

MCMC Simulations

These is extremely easy to simulate this mean field Potts model since we only need to keep track of the ratio {L=(L_1, \ldots, L_q)} to have an accurate picture of the system. For example, a typical Markov Chain Monte Carlo approach would run as follows:

  • choose a particle {x_i} uniformly at random in {\{1,2, \ldots, N\}}
  • try to switch its value uniformly in {\{1,2, \ldots, q\} \setminus \{x_i\}}
  • compute the Metropolis ratio
  • update accordingly.

If we do that {10^5} times for {q=3} states at inverse temperature {\beta=1.5} and for {100} particles (which is fine since we only need to keep track of the {3}-dimensional ratio vector) and plot the result in barycentric coordinates we get a picture that looks like:
Potts Model MCMC

Here I started with a configuration where all the particles were in the same states i.e ratio vector equal to {(1,0,0)}. We can see that even with {10^5} steps, the algorithm struggles to go from one most probable position {(a,b,b)} to the other two {(b,a,b)} and {(b,b,a)} – in this simulation, one of the most probable state has even not been visited! Indeed, this approach was extremely naive, and this is quite interesting to try to come up with better algorithms. Btw, Christian Robert’s blog has tons of interesting stuffs related to MCMC and how to boost up the naive approach presented here.

Challenge – Announcement

Have you already heard about the 17×17 Challenge ? It all started 7 months ago, and still no solution has been found. I have the feeling though that no really serious attempt at solving the problem has been done, even if very encouraging results have been found. It could be very interesting to keep track of the progress made so that results will be published here. Send what you have reached so far!

In the next few weeks, I will try to write about the Monte Carlo approaches described in the excellent book “Monte Carlo Strategies in Scientific Computing” by Jun Liu: I do not think that one can easily solve the problem with a direct implementation of these methods, but I want to know more about it. These are all more or less Markov Chain Monte Carlo Methods: I have tried the “parallel tempering” method, which gives results that are not that bad, and was planning to check the “flat histogram” method. “Simulated annealing” methods have succesfully been used to solve the Sudoku problem though: see Christian Robert blog for example. I am also planning to continue my reading of the fascinating book “Information, Physics, and Computation” by Marc Mezard: methods described in this book (belief propagation, k-SAT problem, etc…) are more adapted to the 17×17 challenge I think, and I am quite optimistic about it.



Best Solution so far, by Rohan Puttagunta: only 4 monochromatic rectangles!

Read also here,here,here,here and there – and have a try with this on-line simulator !

Random permutations and giant cycles

The other day, Nathanael Berestycki presented some very nice results on random permutations. Start with the identity permutation of {\mathfrak{S}_N} and compose with random transpositions: in short, at time {k} we consider the random permutation {\sigma_k = \tau_k \circ \ldots \tau_{2} \circ \tau_1} where {\tau_i} is a transposition chosen uniformly at random among the {\frac{N(N-1)}{2}} transpositions of {\mathfrak{S}_N}. How long do we have to wait before seeing a permutation that has a cycle of length greater than {\epsilon N}, where {\epsilon>0} is a fixed constant ?

The answer has been known for quite a long time (O. Schramm,2005), but no simple proof was available: the highlight of Nathanael’s talk was a short and elegant proof of the fact that we only have to wait a little bit more than {N/2} steps to see this giant cycle! The first half of the proof is such a beauty that I cannot resist in sharing it here!

1. Visual representation

First, this is more visual to represent a permutation by its cycle decomposition. Consider for example the permutation {\sigma = (1,3,4)(5,6,8,9,10,11)(2)(7) \in \mathfrak{S}_{10}} that sends {1 \rightarrow 3 \rightarrow 4 \rightarrow 1} etc… It can be graphically represented by the following pictures:

cycle structure - visual representation

cycle structure - visual representation

If this permutation is composed with the transposition {(4,6)} , this gives the following cycle structure:

two cycles merge

two cycles merge

If  it is composed with {(8,11)} this gives,

a cycles breaks down into two

a cycles breaks down into two

It is thus pretty clear that the cycle structure of {\sigma_{k+1}} can be obtained from the cycle structure of {\sigma_k} in only 2 ways:

  • or two cycles of {\sigma_k} merge into one,
  • or one cycle of {\sigma_k} breaks into two cycles.

In short, the process {\{\sigma_k\}_k} behaves like a fragmentation-coalescence processes. At the beginning, the cycle structure of {\sigma_0=\text{Id}} is trivial: there are {N} trivial cycles of length one! Then, as times goes by, they begin to merge, and sometimes they also break down. At each step two integers {i} and {j} are chosen uniformly at random (with {i \neq j}): if {i} and {j} belong to the same cycle, this cycle breaks into two parts. Otherwise the cycle that contains {i} merges with the cycle that contains {j}. Of course, since at the beginning all the cycles are extremely small, the probability that a cycles breaks into two part is extremely small: a cycle of size {k} breaks into two part with probability roughly equal to {\Big(\frac{k}{N}\Big)^2}.

2. Comparison with the Erdos-Renyi random Graph

Since the fragmentation part of the process can be neglected at the beginning, this is natural to compare the cycle structure at time {k} with the rangom graph {G(N,p_k)} where {\frac{N(N-1)}{2}p_k = k}: this graph has {N} vertices and roughly {k} edges since each vertex is open with probability {p_k}. A coupling argument makes this idea more precise:

  1. start with {\sigma_0=\textrm{Id}} and the graph {G_0} that has {N} disconnected vertices {\{1,2, \ldots, N\}}
  2. choose two different integers {i \neq j}
  3. define {\sigma_{k+1}=(i,j) \circ \sigma_k}
  4. to define {G_{k+1}}, take {G_k} and join the vertices {i} and {j} (never mind if they were already joined)
  5. go to step {2}.

The fundamental remark is that each cycle of {\sigma_k} is included in a connected component of {G_k}: this is true at the beginning, and this property obviously remains true at each step. The Erdos-Renyi random graph {G(n,p)} is one of the most studied object: it is well-known that for {c<1}, with probability tending to {1}, the size of the largest connected component of {G(N,\frac{c}{N})} is of order {\ln(N)}. Since {p_k \sim \frac{2k}{N^2}}, this immediately shows that for {k \approx \alpha N}, with {\alpha < \frac{1}{2}} the length of the longuest cycle in {\sigma_k} is of order {\ln(N)} or less. This is one half of the result. It remains to work a little bit to show that if we wait a little bit more than {\frac{N}{2}}, the first cycle of order {N} appears.

For a permutation {\sigma \in \mathfrak{S}_N} with cycle decomposition {\sigma = \prod_{j=1}^r c_i} (cycles of length {1} included), one can consider the quantity

\displaystyle  T(\sigma) = \sum_{j=1}^R |c_j|^2.

This is a quantity between {N} (for the identity) and {N^2} that measures in some extends the sizes of the cycles. In order to normalize everything and to take into account that the cycle structure abruptly changes near the critical value {\frac{N}{2}}, one can also observe this quantity on a logarithmic scale:

\displaystyle  s(\sigma) = \frac{ \ln\Big( T(\sigma)/N^2 \Big) }{\ln(N)}.

This quantity {s(\sigma)} lies between {0} and {1}, and evolves quite smoothly with time: look at what it looks like, once the time has been rescaled by a factor {N}:

fluid limit ?

fluid limit ?

The plot represented the evolution of the quantity {s\Big(\sigma(\alpha N)\Big)} between time k=0 and k=\frac{N}{2} for dimensions N=10000,20000,30000.

Question: does the random quantity s(\alpha,N)={s\Big(\sigma(\alpha N)\Big)} converge (in probability) towards a non trivial constant {S(\alpha) \in (0,1)} ? It must be the analogue quantity for the random graph {G(N,\frac{2\alpha}{N})}, so I bet it is a known result …

Doob H-transforms

I read today about Doob h-transforms in the Rogers-Williams … It is done quite quickly in the book so that I decided to practice on some simple examples to see how this works.

So we have a Markov process {X_t} living in the state space {S}, and we want to see how this process looks like if we condition on the event {X_T \in A} where {A} is a subset of the state space. To fix the notations we define {p(t,t+s,x,y) = P(X_{t+s}=y|X_t=x)} and {h(t,x)=P(X_T \in A \, | X_t=x)}. The conditioned semi-group {\hat{p}(t,t+s,x,y)=P(X_{t+s}=y|X_t=x, X_T \in A)} is quite easily computed from {p} and {h}. Indeed, this also equals

\displaystyle \hat{p}(t,t+s,x,y) = \frac{P(X_{t+s}=y; X_T \in A\;|X_t=x)}{P(X_T \in A \,|X_t=x)} = p(t,t+s,x,y) \frac{h(t+s,y)}{h(t,x)}.

Notice also that {\hat{p}(t,t+s,x,y) = p(t,t+s,x,y) \frac{h(t+s,y)}{h(t,x)}} is indeed a Markov kernel in the sense that {\int_{y} \hat{p}(t,t+s,x,y) \, dy = 1}: the only property needed for that is

\displaystyle  h(t,x) = \int_{y} p(t,t+s,x,y)h(t+s,y)\,dy = E\left[ h(t+s,X_{t+s}) \, |X_t=x\right].

In fact, we could take any function {h} that satisfies this equality and define a new Markovian kernel {\hat{p}} and study the associated Markov process. That’s what people usually do by the way.

Remark 1 we almost never know explicitly the quantity {h(t,x)}, except in some extremely simple cases !

Before trying these ideas on some simple examples, let us see what this says on the generator of the process:

  1. continuous time Markov chains, finite state space:let us suppose that the intensity matrix is {Q} and that we want to know the dynamic on {[0,T]} of this Markov chain conditioned on the event {X_T=z}. Indeed {p(t,t+s,i,j) = [\exp(sQ)]_{i,j}} so that {\hat{p}(t,t+s,i,j) = [\exp(sQ)]_{i,j} \frac{p(t+s,T,j,z)}{p(t,T,i,z)}} so that in the limit we see that at time {t}, the intensity of the jump from {i} to {j} of the conditioned Markov chain is
    \displaystyle  Q(i,j) \frac{p(t+s,T,j,z)}{p(t,T,i,z)}.

    Notice how this behaves while {t \rightarrow T}: if at {t=T-\epsilon} the Markov chain is in state {i \neq z} then the intensity of jump from {i} to {z} is equivalent to {\approx \frac{1}{\epsilon}}.

  2. diffusion processes:this time consider a {1}-dimensional diffusion {dX_t = \mu(X_t) \, dt + \sigma(X_t) \, dW_t} on {[0,T]} conditioned on the event {X_T \in A} and define as before {h(t,x)=P(X_T \in A \,|X_t=x)}. The generator of the (non-homogeneous) conditioned diffusion is defined at time {t} by
    \displaystyle  \begin{array}{rcl}  \mathcal{G}^{(t)} f(x) &=& \lim_{s \rightarrow 0} \frac{1}{s} \Big( E\left[f(X_{t+s}) \,| X_t=x, X_T \in A\right]-f(x) \Big)\\ &=& \lim_{s \rightarrow 0} \frac{E\left[ f(X_{t+s}) h(t+s, X_{t+s}) \,| X_t=x\right]-f(x) }{s\,h(t,x)} \end{array}

    so that if {\mathcal{L} = \mu \partial_x + \frac{1}{2} \sigma^2 \partial^2_{xx}} is the generator of the original diffusion we get

    \displaystyle  \mathcal{G}^{(t)} f = \frac{1}{h} \Big(\partial_t + \mathcal{L})(hf).

    Because {(\partial_t + \mathcal{L})h=0}, this also reads

    \displaystyle  \mathcal{G}^{(t)} f = \mathcal{L}f + \sigma^2 \frac{\partial_x h}{h} \partial_x f.

    This means that the conditioned diffusion {Z} follows the SDE:

    \displaystyle  dZ_t = \Big( \mu(Z_T) + \sigma(Z_t)^2 \frac{\partial_x h(t,Z_t)}{h(t,Z_t)} \Big) \, dt+ \sigma(Z_t) dW_t.

    The volatility function remains the same while an additional drift shows up.

We will try these ideas on some examples where the probability densities are extremely simple. Notice that in the case of diffusions, if we take {A=\{x^+\}}, the function {(t,x) \mapsto P(X_T = x^+ \, |X_t=x)} is identically equal to {0} (except degenerate cases): to condition on the event {X_T=x} we need instead to take {h(t,x)} to be the transition probability {p(t,T,x,x^+)}. This follows from the approximation {P(X_T \in (x^+,x^+ + dx) \,|X_t=x ) \approx p(t,T,x,x^+) \, dx + o(dx)}. Let’s do it:

  • Brownian Bridge on {[0,T]}:in this case {p(t,T,x,0) \propto e^{-\frac{|x-y|^2}{2(T-t)}}} so that the additional drift reads {\frac{-x}{T-t}}: a Brownian bridge follows the SDE
    \displaystyle  dX_t = -\frac{X_t}{T-t} \, dt + dW_T.

    This might not be the best way to simulate a Brownian bridge though!

  • Poisson Bridge on {[0,T]}:we condition a Poisson process of rate {\lambda} on the event {X_T=N}. The intensity matrix is simply {Q(k,k+1)=\lambda=-Q(k,k)} and {0} everywhere else while the transition probabilities are given by {p(t,T,k,N) = e^{-\lambda (T-t)} \frac{(\lambda (T-t) )^{N-k}}{(N-k)!}}. This is why at time {t}, the intensity from {k} to {k+1} is given by
    \displaystyle  \lambda(t,k,k+1) = \frac{N-k}{T-t}.

    Again, that might not be the most efficient way to simulate a Poisson Bridge ! Notice how the intensity {\lambda} has disappeared …

  • Ornstein-Uhlenbeck Bridge:Let’s consider the usual OU process given by the dynamic {dX_t = -X_t + \sqrt{2}dW_t}: the invariant probability is the usual centred Gaussian distribution. Say that we want to know how does such an OU process behave if we condition on the event {X_T = z}. Because {p(t,T,x,z) \propto \exp(-\frac{|z-e^{-(T-t)}x|^2}{2(1-e^{-2(T-t)})^2})} we find that the conditioned O-U process follows the SDE
    \displaystyle dX_t = \Big(\frac{z-e^{-(T-t)x}}{e^{T-t}(1-e^{-2(T-t)})^2} - X_t \Big)\, dt+ \sqrt{2} \, dW_t.

    If we Taylor expand the additonal drift, it can be seen that this term behaves exactly as in the case of the Brownian bridge. Below is a plot of an O-U process conditioned on the event {X_{10} = 10}, starting from {X_0=0}.

    conditioned O-U process

    conditioned O-U process

Brownian particles on a circle

Today, James norris gave a talk related to Diffusion-limited aggregation processes and mentioned, in passing, the following amusing fact: put {N} equidistant Brownian particles {W_1, \ldots, W_N} on the circle with unit circumference and let them evolved. When two of them collide they get stuck to each other and continue together afterwards: after a certain amount of time {T_N}, only one particle remains. Perhaps surprisingly, this is extremely easy to obtain the first few properties of {T_N}. For example, {\lim_N \mathop{\mathbb E}\left[ T_N \right] = \frac{1}{6}}.

To see that, define {D_k} the distance between {W_k} and {W_{k+1}} (modulo {N}) so that {D_k = \frac{1}{N}} for {k=1,2 \ldots, N}. Notice then (It\^o’s formula) that

\displaystyle  M_t = e^{\lambda t} \sum_{k=1^N} \sin(\sqrt{\lambda} D_k)

is a (local) martingale that starts from {M_0 = N \sin(\frac{\sqrt{\lambda}}{N})}. Also, at time {T_N}, exactly {N-1} of the distances {D_1, D_2, \ldots, D_N} are equal to {0} while one of them is equal to {1}: this is why {M_{T_N} = e^{\lambda T_N} \sin(\sqrt{\lambda})}. The end is clear: apply the optional sampling theorem (to be rigorous, take {\lambda} not too big, or do some kind of truncations to be sure that the optional sampling theoem applies) to conclude that

\displaystyle  \mathop{\mathbb E}\left[ e^{\lambda T_N} \right] = \frac{N \sin(\frac{\sqrt{\lambda}}{N})}{\sin(\sqrt{\lambda})}.

This gives for example {\mathop{\mathbb E}\left[ T_N \right] = \frac{1}{6}(1-\frac{1}{N^2})}. I just find it cute!

So what if we do that on a segment ?

On the Wiener space

I would like to discuss in this post the importance of the heuristic functional

\displaystyle I(W) := \frac{1}{2} \int_0^T |\frac{d}{dt} W|^2 \, dt \ \ \ \ \ (1)


that often shows up when doing analysis on the Wiener space {S = (C([0,T], {\mathbb R}), \|\cdot\|_{\infty})}: an element of the Wiener space is traditionally denoted by {\omega} – this is a continuous function and {\omega(t)} is its value at time {t \in [0,T]}. For a (nice) subset {A} of {S}, the Wiener measure of {A} is nothing else than the probability that a Brownian path belongs to {A}. Having said that, the quantity {I(W)} hardly makes sense since a Brownian path {(W_t: t \in [0,T])} is (almost surely) non differentiable anywhere: still, this is a very useful heuristic in many situations. A review of probability can be found on the excellent blog of Terry Tao.

Where does it come from ?

As often, this is very instructive to come back to the discrete setting. Consider a time interval {[0,T]} and a discretization parameter {\Delta t = \frac{T}{N}}: a discrete Brownian path is represented by the {N}-tuple

\displaystyle W^{(N)} = (W_{\Delta t},W_{2\Delta t},\ldots,W_{T}) \in {\mathbb R}^{N}.

The random variables {\Delta W_k := W_{k \Delta t} - W_{(k-1) \Delta t}} are independent centred Gaussian variables with variance {\Delta t} so that the random vector {W^{(N)}} has a density {\mathop{\mathbb P}^{(N)}} with respect to the {N}-dimensional Lebesgue measure

\displaystyle \begin{array}{rcl} \frac{d \mathop{\mathbb P}^{(N)}}{d \lambda^{\textrm{Leb}}}(W^{(N)}) &\propto& \exp\{-\frac{1}{2 \Delta t} \sum_{k=1}^N |W_{k \Delta t} - W_{(k-1) \Delta t}|^2\} \\ &=& \exp\{-\frac{1}{2} \sum_{k=1}^N |\frac{W_{k \Delta t} - W_{(k-1) \Delta t}}{\Delta t}|^2 \, \Delta t\} \\ &=& \exp\{- I^{(N)}(W^{N})\}. \end{array}

The functional {I^{(N)} := \frac{1}{2} \sum_{k=1}^N |\frac{W_{k \Delta t} - W_{(k-1) \Delta t}}{\Delta t}|^2 \, \Delta t} is indeed a discretization of {I}. Informally, the Wiener measure has a density proportional to {\exp\{-I(W)\}} with respect to the “infinite dimensional Lebesgue measure”: this does not make much sense because there is no such thing as the infinite dimensional Lebesgue measure. This should be understood as the limiting case {N \rightarrow \infty} of the discretization procedure presented above. Indeed, this is not an absolute non-sense to say that

\displaystyle \mathop{\mathbb P}[ \omega = g] \sim e^{-I(g)}. \ \ \ \ \ (2)


because we will see that if {f,g} are two nice functions then

\displaystyle \lim_{ \epsilon \rightarrow 0} \frac{P[ \|W-f\|_{\infty} < \epsilon]}{P[ \|W-g\|_{\infty} < \epsilon]} = \frac{e^{-I(f)}}{e^{-I(g)}}.

It is then very convenient to write

\displaystyle \mathop{\mathbb P}[\omega \in A] = \int_{A} e^{-I(W)} \, d\lambda(W)

where {\lambda} is a fictional infinite dimensional Lebesgue measure (ie: translation invariant).

Translations in the Wiener space

As a first illustration of the heuristic {\mathop{\mathbb P}[\omega = g] \sim e^{-I(g)}}, let see how the Wiener measure behave under translations. If we choose a nice continuous function {f} such that {I(f)} is well defined (ie: {\dot{f} \in L^2([0,T]))}), a translated probability measure {\mathop{\mathbb P}^{f}} can be defined through the relation

\displaystyle \mathop{\mathbb P}^f(A) := \mathop{\mathbb P}(f+\omega \in A). \ \ \ \ \ (3)


This is not clear that {\mathop{\mathbb P}^f} is absolutely continuous with respect to the Wiener measure {\mathop{\mathbb P}}. Of course, we impose that {f(0)=0}. For a set {A \subset S}, the heuristic says that

\displaystyle \begin{array}{rcl} \mathop{\mathbb P}^f( \omega \in A) &=& \mathop{\mathbb P}(\omega \in A-f) = \int_{A-f} e^{-I(W)} \, d\lambda(W)\\ &\stackrel{\textrm{(trans. inv.)}}{=}& \int_{A} e^{-I(W-f)} \, d\lambda(W)\\ &=& \int_{A} e^{-\frac{1}{2} \int_0^T |\dot{W}-\dot{f}|^2 \, dt} \, d\lambda(W)\\ &=& \int_{A} e^{\int_0^T \dot{f} \dot{W} \, dt -\frac{1}{2} \int_0^T \dot{f}^2 \, dt} e^{-I(W)}\, d\lambda. \end{array}

This is why, writing {\dot{W} \, dt = dW}, we obtain the following change of probability formula

Proposition 1 Cameron-Martin-Girsanov change of probability formula:

for any continuous function {f} such that {f(0)=0} and {\dot{f} \in L^2([0,T])},

\displaystyle \frac{d \mathop{\mathbb P}^f}{d \mathop{\mathbb P}}(W) = Z^f(W) := \exp\{ \int_0^T \dot{f} dW_t -\frac{1}{2} \int_0^T \dot{f}^2 \, dt \} \ \ \ \ \ (4)


This change of probability formula is extremely useful since this is typically much more convenient to work with a Brownian motion {W_t} than with a drifted Brownian motion {f(t) + W_t}. In many situations, we get rid of the annoying stochastic integral {\int_0^T \dot{f} dW_t}: if {f} is regular enough ({f \in C^3([0,T])}, say) we have

\displaystyle \begin{array}{rcl} \frac{d \mathop{\mathbb P}^f}{d \mathop{\mathbb P}}(W) &=& Z^f(W)\\ &:=& \exp\{ \dot{f}(T)W(T) - \int_0^T \ddot{f}(t) W(t)\, dt -\frac{1}{2} \int_0^T \dot{f}^2 \, dt\}. \end{array}


The next section is a straightforwards application of this change of variable formula.

Probability to be in an {\epsilon}-tube

Suppose that {f,g} are two nice functions (smooth, say): for small {\epsilon \ll 1}, what is a good approximation of the quotient

\displaystyle Q(f,g,\epsilon) = \frac{\mathop{\mathbb P}(\|W-f\|_{\infty}<\epsilon)}{\mathop{\mathbb P}(\|W-g\|_{\infty}<\epsilon)}.

In words, this basically asks the question: how more probable is the event {\{ W \textrm{looks like } f\}} than the event {\{ W \textrm{looks like } g\}} ? Of course, since this can also be read as

\displaystyle Q(f,g,\epsilon) = \frac{Q(f,0,\epsilon)}{Q(g,0,\epsilon)}

where {0} indicates the function identically equal to zero, it suffices to consider the case {g=0}. If we introduce the event

\displaystyle A_{\epsilon} = \{ \omega \in S: \|\omega\|_{\infty} < \epsilon\}),

the quotient {Q(f,0,\epsilon)} is equal to {\frac{\mathop{\mathbb P}^f(A_{\epsilon})}{\mathop{\mathbb P}(A_{\epsilon})}}. This why, using the change of probability formula (4),

\displaystyle \begin{array}{rcl} Q(f,0,\epsilon) &=& \frac{\mathop{\mathbb P}^f(A_{\epsilon})}{\mathop{\mathbb P}(A_{\epsilon})} = \frac{\mathop{\mathbb E}\left[ 1_{A_{\epsilon}}(W) Z^f(W) \right]}{\mathop{\mathbb E}\left[ 1_{A_{\epsilon}}(W) \right]} \end{array}

with {Z^f(W) = \exp\{\dot{f}(T)W(T) - \int_0^T \ddot{f}(t) W(t)\, dt -\frac{1}{2} \int_0^T \dot{f}^2 \, dt\}}. If {\|\dot{f}\|_{\infty}, \|\ddot{f}\|_{\infty} \leq C}, this is clear that for {W \in A},

\displaystyle -(C\epsilon + \epsilon C T) < \dot{f}(T)W(T) - \int_0^T \ddot{f}(t) W(t)\, dt < C\epsilon + \epsilon C T.

Both sides going to zero when {\epsilon} goes to zero, this is enough to conclude that

\displaystyle \lim_{\epsilon \rightarrow 0} Q(f,0,\epsilon) = \exp\{-\frac{1}{2} \int_0^T \dot{f}^2 \, dt\}\} = \exp\{ -I(f)\}. \ \ \ \ \ (5)

In short, for any two reasonably nice functions (for example {f,g \in C^3([0,T])}) {f,g} that satisfy {f(0)=g(0)=0},

\displaystyle \lim_{\epsilon \rightarrow 0} Q(f,0,\epsilon) \frac{\mathop{\mathbb P}(\|W-f\|_{\infty}<\epsilon)}{\mathop{\mathbb P}(\|W-g\|_{\infty}<\epsilon)} = \frac{ \exp\{ -I(f)\}}{\exp\{ -I(g)\} }. \ \ \ \ \ (6)

Large deviation result

Take a subset {A} of {S} (it might be useful to think of sets like {A_{f,\epsilon,\alpha} = \{\omega: |\omega(u)-f(u)| < \alpha \, \forall u \in U\}}). We are interested to the probability that the rescaled (in space) Brownian motion

\displaystyle W^{(\epsilon)}(t) = \epsilon W(t)

belongs to {A} when {\epsilon} goes to {0}. Typically, if the null function does not belong to (the closure of) {A}, the probability {\mathop{\mathbb P}( W^{(\epsilon)} \in A)} is exponentially small. It turns out that if {A} is regular enough

\displaystyle \ln \mathop{\mathbb P}( \epsilon W \in A) \sim -\epsilon^2 \inf_{f \in A} I(f) := -\epsilon^2 I(A).

Again, the usual heuristic gives this result in no time if we accept not to be too rigorous:

\displaystyle \begin{array}{rcl} \epsilon^2 \, \ln \mathop{\mathbb P}( \epsilon W \in A) &=& \epsilon^2 \, \ln \int_{\epsilon W \in A} e^{-I(W)} \, d\lambda \\ &=& \epsilon^2 \, \ln \int_{W \in A} e^{-I(\frac{W}{\epsilon})} \textrm{(Jacobian)}\, d\lambda \\ &=& \epsilon^2 \, \ln \int_{W \in A} e^{-\frac{I(W)}{\epsilon^2}} \textrm{(Jacobian)}\, d\lambda \\ &\stackrel{\epsilon \rightarrow 0}{\rightarrow}& -\inf \{I(f): f \in A\}. \end{array}

This is very fishy since the Jacobian should behave very badly (actually the measure {\mathop{\mathbb P}[W \in \cdot]} and {\mathop{\mathbb P}[\epsilon W \in \cdot]} are mutually singular) but all this mess can be made perfectly rigorous. Nevertheless, the basic idea is almost there, and it can be proved (Freidlin-Wentzel theory) that for any open set G,

\displaystyle \liminf \epsilon^2 \, \ln \mathop{\mathbb P}( \epsilon W \in G) \geq -\inf \{I(f): f \in G\}

while for any closed set {F},

\displaystyle \limsup \epsilon^2 \, \ln \mathop{\mathbb P}( \epsilon W \in F) \leq -\inf \{I(f): f \in F\}.

One cleaner way to prove this is to used the usual Cramer theorem of large deviations for sums of i.i.d random variables (in Banach space) and notice that for {\epsilon(N) = \frac{1}{\sqrt{N}} } then

\displaystyle W^{\epsilon(N)} = \frac{W_1+W_2+\ldots+W_N}{N}

where {(W_i:i=1,2,\ldots,N)} are independent standard Brownian motions. Cramer theorem states that

\displaystyle \mathop{\mathbb P}[ \frac{W_1+W_2+\ldots+W_N}{N} \in A] \sim \exp\{-N \inf\{I(f): f \in A\} \}


\displaystyle I(f) = \sup\{ \int_{0}^t f(t)g(t)\, dt - \ln \, \mathop{\mathbb E} e^{ \int_0^T g(t) W(t) \, dt } : g \in L^2([0,T])\}.

This is not very hard to see that the supremum is indeed {\frac{1}{2} \int_0^T \dot{f}^2 \, dt}.

Newer entries »