## Fluid Limits and Random Graphs

The other day Christina Goldschmidt gave a very nice talk on results obtained by James Norris and R. W. R. Darling: in their paper they use fluid limit techniques in order to study properties a Random graphs. This approach has long been used to analyse queuing systems for example.

1. Erdos Renyi Random Graph model

Remember that the Erdos Renyi random graph ${\mathcal{G}(N,p)}$ is a graph with ${N}$ vertices ${\{1,2, \ldots, N\}}$ such that any two of them are connected with probability ${p}$, independently.

Erdos-Renyi random graph model

This model of random graphs has been extensively studied and this has long been known that ${G(N,\frac{c}{N})}$ undergoes a phase transition at ${c=1}$: for ${c<1}$, the largest component ${L^{(N,c)}}$ of ${\mathcal{G}(N,\frac{c}{N})}$ has a size of order ${\log(N)}$ while for ${c>1}$ the largest connected component ${L^{(N,c)}}$, also called the giant component, has size of order ${N}$. In another blog entry, another consequence of this phase transition has been used.

One can even show that for ${c>1}$ there exists a constant ${K_c}$ such that for any ${\epsilon > 0}$,

$\displaystyle \lim_{N \rightarrow \infty} \; \; \mathop{\mathbb P}\Big( |\frac{\textrm{Card}(L^{(N,c)})}{N} - K_c| > \epsilon \Big) = 0.$

Simply put, the largest connected component of ${\mathcal{G}(N,\frac{c}{N})}$ has size approximatively equal to ${K_c \cdot N}$. Maybe surprisingly, this is quite simple to compute the value of ${K(c)}$ through fluid limit techniques.

2. Exploration of the graph

Let ${G}$ be a configuration of ${\mathcal{G}(N, \frac{c}{N})}$. This configuration is explored through a classical depth-first algorithm. To keep track of what is doing, a process ${Z_k: \{1, \ldots, N\} \rightarrow \mathbb{Z}}$ is introduced. This process captures essentially all the information needed to study the size of the connected component: it is then shown that ${Z}$ behaves essentially deterministically when looked under the appropriate scale.

1. initilisation: t=0: Color all the vertices in green, the current vertex is ${v=1}$ and define ${Z_1=0}$.
2. t=t+1: Color the current vertex in red (=visited): if a vertex is connected to ${v}$ and is green, color it in grey ( this vertex has still to be visited, but its ‘parent’ has already been visited): these are the children of ${v}$.
3. update ${Z}$ the following way, ${Z_t = Z_{t-1} + (\textrm{Nb of children of }v) - 1}$, and
• if ${v}$ has children, choose one of them and make it the new current vertex. Go to ${2}$.
• If ${v}$ has no children then the new current vertex is its parent. Go to ${2}$.
• if ${v}$ has no children and no parent, the new current vertex is one of the remaining (if any) green vertex. Go to ${2}$
• else: we are finished.

A moment of though reveals that ${Z_1=0}$ and ${Z}$ reaches ${-1}$ precisely when the connected component of the vertex ${1}$ has all been explored. Then ${Z}$ reaches ${-2}$ precisely when the second connected component has all been explored, etc … This is why if ${T_0=1}$ and ${T_k = \inf \{t: Z_t = -k\}}$ then the connected components have size exactly ${S_k = T_{k+1}-T_k}$ for ${k=0,1,2, \ldots}$. The largest component has size ${\max_k S_k}$.

Z process

3. Fluid Limit

It is quite easy to show why the rescaled process ${z^N(t) = \frac{ Z([Nt])}{N}}$ should behave like a differential equation (ie: fluid limit). While exploring the first component, ${Z_k}$ represents the number of gray vertices at time ${k}$ so that ${N-k-Z_k}$ represents the number of green vertices: this is why

$\displaystyle \mathop{\mathbb E}\Big[Z_{k+1}-Z_k \;|Z_k \Big] = (N-k-Z_k) \frac{c}{N} = c\Big(1-\frac{k}{N} - \frac{Z_k}{N} \Big) - 1$

so that ${\mathop{\mathbb E}\Big[ \frac{z^N(t+\Delta)-z^N(t)}{\Delta} | z^N(t) \Big] \approx c\Big(1-t-z(t) \Big) - 1}$ where ${\Delta = \frac{1}{N}}$. Also, the variance statisfies ${\text{Var}\Big[ \frac{z^N(t+\Delta)-z^N(t)}{\sqrt{\Delta}} | z^N(t) \Big] \approx N^{-\frac{1}{2}} c\Big(1-t-z(t) \Big) \rightarrow 0}$ so that all the ingredients for a fluit limit are present: ${z^N}$ converges to the differential equation

$\displaystyle \frac{d}{dt} z(t) = c\Big(1-t-z(t) \Big) - 1$

whose solution is ${z(t) = (1-t)-e^{-ct}}$. This is why ${K=K_c > 0}$ is implicitly defined as the solution of

$\displaystyle 1-K_c = e^{-c K_c}.$

As expected, ${\lim_{c \rightarrow 1^+} K_c = 0}$ and ${\lim_{c \rightarrow +\infty} K_c = 1}$.

## Random permutations and giant cycles

The other day, Nathanael Berestycki presented some very nice results on random permutations. Start with the identity permutation of ${\mathfrak{S}_N}$ and compose with random transpositions: in short, at time ${k}$ we consider the random permutation ${\sigma_k = \tau_k \circ \ldots \tau_{2} \circ \tau_1}$ where ${\tau_i}$ is a transposition chosen uniformly at random among the ${\frac{N(N-1)}{2}}$ transpositions of ${\mathfrak{S}_N}$. How long do we have to wait before seeing a permutation that has a cycle of length greater than ${\epsilon N}$, where ${\epsilon>0}$ is a fixed constant ?

The answer has been known for quite a long time (O. Schramm,2005), but no simple proof was available: the highlight of Nathanael’s talk was a short and elegant proof of the fact that we only have to wait a little bit more than ${N/2}$ steps to see this giant cycle! The first half of the proof is such a beauty that I cannot resist in sharing it here!

1. Visual representation

First, this is more visual to represent a permutation by its cycle decomposition. Consider for example the permutation ${\sigma = (1,3,4)(5,6,8,9,10,11)(2)(7) \in \mathfrak{S}_{10}}$ that sends ${1 \rightarrow 3 \rightarrow 4 \rightarrow 1}$ etc… It can be graphically represented by the following pictures:

cycle structure - visual representation

If this permutation is composed with the transposition ${(4,6)}$ , this gives the following cycle structure:

two cycles merge

If  it is composed with ${(8,11)}$ this gives,

a cycles breaks down into two

It is thus pretty clear that the cycle structure of ${\sigma_{k+1}}$ can be obtained from the cycle structure of ${\sigma_k}$ in only 2 ways:

• or two cycles of ${\sigma_k}$ merge into one,
• or one cycle of ${\sigma_k}$ breaks into two cycles.

In short, the process ${\{\sigma_k\}_k}$ behaves like a fragmentation-coalescence processes. At the beginning, the cycle structure of ${\sigma_0=\text{Id}}$ is trivial: there are ${N}$ trivial cycles of length one! Then, as times goes by, they begin to merge, and sometimes they also break down. At each step two integers ${i}$ and ${j}$ are chosen uniformly at random (with ${i \neq j}$): if ${i}$ and ${j}$ belong to the same cycle, this cycle breaks into two parts. Otherwise the cycle that contains ${i}$ merges with the cycle that contains ${j}$. Of course, since at the beginning all the cycles are extremely small, the probability that a cycles breaks into two part is extremely small: a cycle of size ${k}$ breaks into two part with probability roughly equal to ${\Big(\frac{k}{N}\Big)^2}$.

2. Comparison with the Erdos-Renyi random Graph

Since the fragmentation part of the process can be neglected at the beginning, this is natural to compare the cycle structure at time ${k}$ with the rangom graph ${G(N,p_k)}$ where ${\frac{N(N-1)}{2}p_k = k}$: this graph has ${N}$ vertices and roughly ${k}$ edges since each vertex is open with probability ${p_k}$. A coupling argument makes this idea more precise:

1. start with ${\sigma_0=\textrm{Id}}$ and the graph ${G_0}$ that has ${N}$ disconnected vertices ${\{1,2, \ldots, N\}}$
2. choose two different integers ${i \neq j}$
3. define ${\sigma_{k+1}=(i,j) \circ \sigma_k}$
4. to define ${G_{k+1}}$, take ${G_k}$ and join the vertices ${i}$ and ${j}$ (never mind if they were already joined)
5. go to step ${2}$.

The fundamental remark is that each cycle of ${\sigma_k}$ is included in a connected component of ${G_k}$: this is true at the beginning, and this property obviously remains true at each step. The Erdos-Renyi random graph ${G(n,p)}$ is one of the most studied object: it is well-known that for ${c<1}$, with probability tending to ${1}$, the size of the largest connected component of ${G(N,\frac{c}{N})}$ is of order ${\ln(N)}$. Since ${p_k \sim \frac{2k}{N^2}}$, this immediately shows that for ${k \approx \alpha N}$, with ${\alpha < \frac{1}{2}}$ the length of the longuest cycle in ${\sigma_k}$ is of order ${\ln(N)}$ or less. This is one half of the result. It remains to work a little bit to show that if we wait a little bit more than ${\frac{N}{2}}$, the first cycle of order ${N}$ appears.

For a permutation ${\sigma \in \mathfrak{S}_N}$ with cycle decomposition ${\sigma = \prod_{j=1}^r c_i}$ (cycles of length ${1}$ included), one can consider the quantity

$\displaystyle T(\sigma) = \sum_{j=1}^R |c_j|^2.$

This is a quantity between ${N}$ (for the identity) and ${N^2}$ that measures in some extends the sizes of the cycles. In order to normalize everything and to take into account that the cycle structure abruptly changes near the critical value ${\frac{N}{2}}$, one can also observe this quantity on a logarithmic scale:

$\displaystyle s(\sigma) = \frac{ \ln\Big( T(\sigma)/N^2 \Big) }{\ln(N)}.$

This quantity ${s(\sigma)}$ lies between ${0}$ and ${1}$, and evolves quite smoothly with time: look at what it looks like, once the time has been rescaled by a factor ${N}$:

fluid limit ?

The plot represented the evolution of the quantity ${s\Big(\sigma(\alpha N)\Big)}$ between time $k=0$ and $k=\frac{N}{2}$ for dimensions $N=10000,20000,30000$.

Question: does the random quantity $s(\alpha,N)={s\Big(\sigma(\alpha N)\Big)}$ converge (in probability) towards a non trivial constant ${S(\alpha) \in (0,1)}$ ? It must be the analogue quantity for the random graph ${G(N,\frac{2\alpha}{N})}$, so I bet it is a known result …