A brilliant talk by Persi Diaconis !
A simple model of interacting particles
The mean field Potts model is extremely simple: there are interacting particles and each one of them can be in different states . Define the Hamiltonian
where and is the Kronecker symbol. The normalization ensures that the energy is an extensive quantity so that the mean energy per particle does no degenerate to or for large values of . The sign minus is here to favorize configurations that have a lot of particles in the same state. The Boltzman distribution at inverse temperature on is given by
where is a normalization constant. Notice that if we choose a configuration uniformly at random in , with overwhelming probability the ratio of particles in state will be close to . Also it is obvious that if we define
then will be close to for a configuration taken uniformly at random. Stirling formula even says that the probability that is close to is close to where
Indeed . The situation is quite different under the Boltzman distribution since it favorizes the configurations that have a lot of particles in the same state: this is because the Hamiltonian is minimized for configurations that have all the particles in the same state. In short there is a competition between the entropy (there are a lot of configurations close to the ratio ) and the energy that favorizes the configurations where all the particles are in the same state.
With a little more work, one can show that there is a critical inverse temperature such that:
- for the entropy wins the battle: the most probable configurations are close to the ratio
- for the energy effect shows up: there are most probable configurations that are the permutations of where and are computable quantities.
The point is that above the system has more than one stable equilibrium point. Maybe more important, if we compute the energy of these most probable states
then this function has a discontinuity at . I will try to show in the weeks to come how this behaviour can dramatically slow down usual Monte-Carlo approach to the study of these kind of models.
Hugo Touchette has a very nice review of statistical physics that I like a lot and a good survey of the Potts model. Also T. Tao has a very nice exposition of related models. The blog of Georg von Hippel is dedicated to similar models on lattices, which are far more complex that this mean field approximation presented here.
These is extremely easy to simulate this mean field Potts model since we only need to keep track of the ratio to have an accurate picture of the system. For example, a typical Markov Chain Monte Carlo approach would run as follows:
- choose a particle uniformly at random in
- try to switch its value uniformly in
- compute the Metropolis ratio
- update accordingly.
If we do that times for states at inverse temperature and for particles (which is fine since we only need to keep track of the -dimensional ratio vector) and plot the result in barycentric coordinates we get a picture that looks like:
Here I started with a configuration where all the particles were in the same states i.e ratio vector equal to . We can see that even with steps, the algorithm struggles to go from one most probable position to the other two and – in this simulation, one of the most probable state has even not been visited! Indeed, this approach was extremely naive, and this is quite interesting to try to come up with better algorithms. Btw, Christian Robert’s blog has tons of interesting stuffs related to MCMC and how to boost up the naive approach presented here.
Have you already heard about the 17×17 Challenge ? It all started 7 months ago, and still no solution has been found. I have the feeling though that no really serious attempt at solving the problem has been done, even if very encouraging results have been found. It could be very interesting to keep track of the progress made so that results will be published here. Send what you have reached so far!
In the next few weeks, I will try to write about the Monte Carlo approaches described in the excellent book “Monte Carlo Strategies in Scientific Computing” by Jun Liu: I do not think that one can easily solve the problem with a direct implementation of these methods, but I want to know more about it. These are all more or less Markov Chain Monte Carlo Methods: I have tried the “parallel tempering” method, which gives results that are not that bad, and was planning to check the “flat histogram” method. “Simulated annealing” methods have succesfully been used to solve the Sudoku problem though: see Christian Robert blog for example. I am also planning to continue my reading of the fascinating book “Information, Physics, and Computation” by Marc Mezard: methods described in this book (belief propagation, k-SAT problem, etc…) are more adapted to the 17×17 challenge I think, and I am quite optimistic about it.
June 3, 2010 at 11:07 pm (coagulation, fluid limit, fragmentation, markov chain, probability, random permutation)
Tags: Erdos Renyi, giant cycle, markov chain, phase transition, random graph, random permutation
The other day, Nathanael Berestycki presented some very nice results on random permutations. Start with the identity permutation of and compose with random transpositions: in short, at time we consider the random permutation where is a transposition chosen uniformly at random among the transpositions of . How long do we have to wait before seeing a permutation that has a cycle of length greater than , where is a fixed constant ?
The answer has been known for quite a long time (O. Schramm,2005), but no simple proof was available: the highlight of Nathanael’s talk was a short and elegant proof of the fact that we only have to wait a little bit more than steps to see this giant cycle! The first half of the proof is such a beauty that I cannot resist in sharing it here!
1. Visual representation
First, this is more visual to represent a permutation by its cycle decomposition. Consider for example the permutation that sends etc… It can be graphically represented by the following pictures:
If this permutation is composed with the transposition , this gives the following cycle structure:
If it is composed with this gives,
It is thus pretty clear that the cycle structure of can be obtained from the cycle structure of in only 2 ways:
- or two cycles of merge into one,
- or one cycle of breaks into two cycles.
In short, the process behaves like a fragmentation-coalescence processes. At the beginning, the cycle structure of is trivial: there are trivial cycles of length one! Then, as times goes by, they begin to merge, and sometimes they also break down. At each step two integers and are chosen uniformly at random (with ): if and belong to the same cycle, this cycle breaks into two parts. Otherwise the cycle that contains merges with the cycle that contains . Of course, since at the beginning all the cycles are extremely small, the probability that a cycles breaks into two part is extremely small: a cycle of size breaks into two part with probability roughly equal to .
2. Comparison with the Erdos-Renyi random Graph
Since the fragmentation part of the process can be neglected at the beginning, this is natural to compare the cycle structure at time with the rangom graph where : this graph has vertices and roughly edges since each vertex is open with probability . A coupling argument makes this idea more precise:
- start with and the graph that has disconnected vertices
- choose two different integers
- to define , take and join the vertices and (never mind if they were already joined)
- go to step .
The fundamental remark is that each cycle of is included in a connected component of : this is true at the beginning, and this property obviously remains true at each step. The Erdos-Renyi random graph is one of the most studied object: it is well-known that for , with probability tending to , the size of the largest connected component of is of order . Since , this immediately shows that for , with the length of the longuest cycle in is of order or less. This is one half of the result. It remains to work a little bit to show that if we wait a little bit more than , the first cycle of order appears.
For a permutation with cycle decomposition (cycles of length included), one can consider the quantity
This is a quantity between (for the identity) and that measures in some extends the sizes of the cycles. In order to normalize everything and to take into account that the cycle structure abruptly changes near the critical value , one can also observe this quantity on a logarithmic scale:
This quantity lies between and , and evolves quite smoothly with time: look at what it looks like, once the time has been rescaled by a factor :
The plot represented the evolution of the quantity between time and for dimensions .
Question: does the random quantity converge (in probability) towards a non trivial constant ? It must be the analogue quantity for the random graph , so I bet it is a known result …
I read today about Doob h-transforms in the Rogers-Williams … It is done quite quickly in the book so that I decided to practice on some simple examples to see how this works.
So we have a Markov process living in the state space , and we want to see how this process looks like if we condition on the event where is a subset of the state space. To fix the notations we define and . The conditioned semi-group is quite easily computed from and . Indeed, this also equals
Notice also that is indeed a Markov kernel in the sense that : the only property needed for that is
In fact, we could take any function that satisfies this equality and define a new Markovian kernel and study the associated Markov process. That’s what people usually do by the way.
Remark 1 we almost never know explicitly the quantity , except in some extremely simple cases !
Before trying these ideas on some simple examples, let us see what this says on the generator of the process:
- continuous time Markov chains, finite state space:let us suppose that the intensity matrix is and that we want to know the dynamic on of this Markov chain conditioned on the event . Indeed so that so that in the limit we see that at time , the intensity of the jump from to of the conditioned Markov chain is
Notice how this behaves while : if at the Markov chain is in state then the intensity of jump from to is equivalent to .
- diffusion processes:this time consider a -dimensional diffusion on conditioned on the event and define as before . The generator of the (non-homogeneous) conditioned diffusion is defined at time by
so that if is the generator of the original diffusion we get
Because , this also reads
This means that the conditioned diffusion follows the SDE:
The volatility function remains the same while an additional drift shows up.
We will try these ideas on some examples where the probability densities are extremely simple. Notice that in the case of diffusions, if we take , the function is identically equal to (except degenerate cases): to condition on the event we need instead to take to be the transition probability . This follows from the approximation . Let’s do it:
- Brownian Bridge on :in this case so that the additional drift reads : a Brownian bridge follows the SDE
This might not be the best way to simulate a Brownian bridge though!
- Poisson Bridge on :we condition a Poisson process of rate on the event . The intensity matrix is simply and everywhere else while the transition probabilities are given by . This is why at time , the intensity from to is given by
Again, that might not be the most efficient way to simulate a Poisson Bridge ! Notice how the intensity has disappeared …
- Ornstein-Uhlenbeck Bridge:Let’s consider the usual OU process given by the dynamic : the invariant probability is the usual centred Gaussian distribution. Say that we want to know how does such an OU process behave if we condition on the event . Because we find that the conditioned O-U process follows the SDE
If we Taylor expand the additonal drift, it can be seen that this term behaves exactly as in the case of the Brownian bridge. Below is a plot of an O-U process conditioned on the event , starting from .