Fluid Limits and Random Graphs

The other day Christina Goldschmidt gave a very nice talk on results obtained by James Norris and R. W. R. Darling: in their paper they use fluid limit techniques in order to study properties a Random graphs. This approach has long been used to analyse queuing systems for example.

1. Erdos Renyi Random Graph model

Remember that the Erdos Renyi random graph {\mathcal{G}(N,p)} is a graph with {N} vertices {\{1,2, \ldots, N\}} such that any two of them are connected with probability {p}, independently.

Erdos-Renyi random graph model

Erdos-Renyi random graph model

This model of random graphs has been extensively studied and this has long been known that {G(N,\frac{c}{N})} undergoes a phase transition at {c=1}: for {c<1}, the largest component {L^{(N,c)}} of {\mathcal{G}(N,\frac{c}{N})} has a size of order {\log(N)} while for {c>1} the largest connected component {L^{(N,c)}}, also called the giant component, has size of order {N}. In another blog entry, another consequence of this phase transition has been used.

One can even show that for {c>1} there exists a constant {K_c} such that for any {\epsilon > 0},

\displaystyle  \lim_{N \rightarrow \infty} \; \; \mathop{\mathbb P}\Big( |\frac{\textrm{Card}(L^{(N,c)})}{N} - K_c| > \epsilon \Big) = 0.

Simply put, the largest connected component of {\mathcal{G}(N,\frac{c}{N})} has size approximatively equal to {K_c \cdot N}. Maybe surprisingly, this is quite simple to compute the value of {K(c)} through fluid limit techniques.

2. Exploration of the graph

Let {G} be a configuration of {\mathcal{G}(N, \frac{c}{N})}. This configuration is explored through a classical depth-first algorithm. To keep track of what is doing, a process {Z_k: \{1, \ldots, N\} \rightarrow \mathbb{Z}} is introduced. This process captures essentially all the information needed to study the size of the connected component: it is then shown that {Z} behaves essentially deterministically when looked under the appropriate scale.

  1. initilisation: t=0: Color all the vertices in green, the current vertex is {v=1} and define {Z_1=0}.
  2. t=t+1: Color the current vertex in red (=visited): if a vertex is connected to {v} and is green, color it in grey ( this vertex has still to be visited, but its ‘parent’ has already been visited): these are the children of {v}.
  3. update {Z} the following way, {Z_t = Z_{t-1} + (\textrm{Nb of children of }v) - 1}, and
    • if {v} has children, choose one of them and make it the new current vertex. Go to {2}.
    • If {v} has no children then the new current vertex is its parent. Go to {2}.
    • if {v} has no children and no parent, the new current vertex is one of the remaining (if any) green vertex. Go to {2}
    • else: we are finished.

depth first exploration

A moment of though reveals that {Z_1=0} and {Z} reaches {-1} precisely when the connected component of the vertex {1} has all been explored. Then {Z} reaches {-2} precisely when the second connected component has all been explored, etc … This is why if {T_0=1} and {T_k = \inf \{t: Z_t = -k\}} then the connected components have size exactly {S_k = T_{k+1}-T_k} for {k=0,1,2, \ldots}. The largest component has size {\max_k S_k}.

Z process

3. Fluid Limit

It is quite easy to show why the rescaled process {z^N(t) = \frac{ Z([Nt])}{N}} should behave like a differential equation (ie: fluid limit). While exploring the first component, {Z_k} represents the number of gray vertices at time {k} so that {N-k-Z_k} represents the number of green vertices: this is why

\displaystyle  \mathop{\mathbb E}\Big[Z_{k+1}-Z_k \;|Z_k \Big] = (N-k-Z_k) \frac{c}{N} = c\Big(1-\frac{k}{N} - \frac{Z_k}{N} \Big) - 1

so that {\mathop{\mathbb E}\Big[ \frac{z^N(t+\Delta)-z^N(t)}{\Delta} | z^N(t) \Big] \approx c\Big(1-t-z(t) \Big) - 1} where {\Delta = \frac{1}{N}}. Also, the variance statisfies {\text{Var}\Big[ \frac{z^N(t+\Delta)-z^N(t)}{\sqrt{\Delta}} | z^N(t) \Big] \approx N^{-\frac{1}{2}} c\Big(1-t-z(t) \Big) \rightarrow 0} so that all the ingredients for a fluit limit are present: {z^N} converges to the differential equation

\displaystyle  \frac{d}{dt} z(t) = c\Big(1-t-z(t) \Big) - 1

whose solution is {z(t) = (1-t)-e^{-ct}}. This is why {K=K_c > 0} is implicitly defined as the solution of

\displaystyle  1-K_c = e^{-c K_c}.

As expected, {\lim_{c \rightarrow 1^+} K_c = 0} and {\lim_{c \rightarrow +\infty} K_c = 1}.

Fluid Limit

Random permutations and giant cycles

The other day, Nathanael Berestycki presented some very nice results on random permutations. Start with the identity permutation of {\mathfrak{S}_N} and compose with random transpositions: in short, at time {k} we consider the random permutation {\sigma_k = \tau_k \circ \ldots \tau_{2} \circ \tau_1} where {\tau_i} is a transposition chosen uniformly at random among the {\frac{N(N-1)}{2}} transpositions of {\mathfrak{S}_N}. How long do we have to wait before seeing a permutation that has a cycle of length greater than {\epsilon N}, where {\epsilon>0} is a fixed constant ?

The answer has been known for quite a long time (O. Schramm,2005), but no simple proof was available: the highlight of Nathanael’s talk was a short and elegant proof of the fact that we only have to wait a little bit more than {N/2} steps to see this giant cycle! The first half of the proof is such a beauty that I cannot resist in sharing it here!

1. Visual representation

First, this is more visual to represent a permutation by its cycle decomposition. Consider for example the permutation {\sigma = (1,3,4)(5,6,8,9,10,11)(2)(7) \in \mathfrak{S}_{10}} that sends {1 \rightarrow 3 \rightarrow 4 \rightarrow 1} etc… It can be graphically represented by the following pictures:

cycle structure - visual representation

cycle structure - visual representation

If this permutation is composed with the transposition {(4,6)} , this gives the following cycle structure:

two cycles merge

two cycles merge

If  it is composed with {(8,11)} this gives,

a cycles breaks down into two

a cycles breaks down into two

It is thus pretty clear that the cycle structure of {\sigma_{k+1}} can be obtained from the cycle structure of {\sigma_k} in only 2 ways:

  • or two cycles of {\sigma_k} merge into one,
  • or one cycle of {\sigma_k} breaks into two cycles.

In short, the process {\{\sigma_k\}_k} behaves like a fragmentation-coalescence processes. At the beginning, the cycle structure of {\sigma_0=\text{Id}} is trivial: there are {N} trivial cycles of length one! Then, as times goes by, they begin to merge, and sometimes they also break down. At each step two integers {i} and {j} are chosen uniformly at random (with {i \neq j}): if {i} and {j} belong to the same cycle, this cycle breaks into two parts. Otherwise the cycle that contains {i} merges with the cycle that contains {j}. Of course, since at the beginning all the cycles are extremely small, the probability that a cycles breaks into two part is extremely small: a cycle of size {k} breaks into two part with probability roughly equal to {\Big(\frac{k}{N}\Big)^2}.

2. Comparison with the Erdos-Renyi random Graph

Since the fragmentation part of the process can be neglected at the beginning, this is natural to compare the cycle structure at time {k} with the rangom graph {G(N,p_k)} where {\frac{N(N-1)}{2}p_k = k}: this graph has {N} vertices and roughly {k} edges since each vertex is open with probability {p_k}. A coupling argument makes this idea more precise:

  1. start with {\sigma_0=\textrm{Id}} and the graph {G_0} that has {N} disconnected vertices {\{1,2, \ldots, N\}}
  2. choose two different integers {i \neq j}
  3. define {\sigma_{k+1}=(i,j) \circ \sigma_k}
  4. to define {G_{k+1}}, take {G_k} and join the vertices {i} and {j} (never mind if they were already joined)
  5. go to step {2}.

The fundamental remark is that each cycle of {\sigma_k} is included in a connected component of {G_k}: this is true at the beginning, and this property obviously remains true at each step. The Erdos-Renyi random graph {G(n,p)} is one of the most studied object: it is well-known that for {c<1}, with probability tending to {1}, the size of the largest connected component of {G(N,\frac{c}{N})} is of order {\ln(N)}. Since {p_k \sim \frac{2k}{N^2}}, this immediately shows that for {k \approx \alpha N}, with {\alpha < \frac{1}{2}} the length of the longuest cycle in {\sigma_k} is of order {\ln(N)} or less. This is one half of the result. It remains to work a little bit to show that if we wait a little bit more than {\frac{N}{2}}, the first cycle of order {N} appears.

For a permutation {\sigma \in \mathfrak{S}_N} with cycle decomposition {\sigma = \prod_{j=1}^r c_i} (cycles of length {1} included), one can consider the quantity

\displaystyle  T(\sigma) = \sum_{j=1}^R |c_j|^2.

This is a quantity between {N} (for the identity) and {N^2} that measures in some extends the sizes of the cycles. In order to normalize everything and to take into account that the cycle structure abruptly changes near the critical value {\frac{N}{2}}, one can also observe this quantity on a logarithmic scale:

\displaystyle  s(\sigma) = \frac{ \ln\Big( T(\sigma)/N^2 \Big) }{\ln(N)}.

This quantity {s(\sigma)} lies between {0} and {1}, and evolves quite smoothly with time: look at what it looks like, once the time has been rescaled by a factor {N}:

fluid limit ?

fluid limit ?

The plot represented the evolution of the quantity {s\Big(\sigma(\alpha N)\Big)} between time k=0 and k=\frac{N}{2} for dimensions N=10000,20000,30000.

Question: does the random quantity s(\alpha,N)={s\Big(\sigma(\alpha N)\Big)} converge (in probability) towards a non trivial constant {S(\alpha) \in (0,1)} ? It must be the analogue quantity for the random graph {G(N,\frac{2\alpha}{N})}, so I bet it is a known result …