## Joe’s Pyramid

Yesterday, while reading the last issue of the NewScientist, I came across the following very cute riddle:

Lazy, I asked myself if it were possible to write ${10}$ lines long Python code to solve this innocent looking enigma. The whole pyramid is entirely determined by the ${6}$ numbers lying at the bottom, and each one of them is an integer between ${1}$ and ${99}$: these numbers must be different so that there are at most ${\binom{99}{6} \approx 1.1 \times 10^9}$ possibilities to test! Brute force won’t work my friend!

When stupid brute force does not work, one can still try annealing/probabilist methods: this works pretty well for Sudoku (which is NP-hard) as this is brilliantly described here and there. The principle is simple: if one can find a good energy function ${E:\{1, \ldots, 99\}^6 \rightarrow \mathbb{R}}$ such that a solution to the problem corresponds to a low energy configuration, one can do MCMC-simulating annealing-etc on the target distribution

$\displaystyle \pi(\text{configuration}) \propto e^{-\beta \cdot E(\text{configuration})}.$

The issue is that it might be very difficult to choose a sensible energy function ${E(\cdot)}$. Foolishly, I first tried the following energy function, and then ran a random walk Metropolis algorithm with ${\pi}$ as target probability:

$\displaystyle \pi(\text{configuration}) \propto e^{\beta \cdot {\text{Height}}(\text{configuration})}$

where ${\text{Height}(\text{configuration})}$ is the numbers of levels that one can fill, starting from the bottom, without encountering any problem ${i.e}$ no repetition and no number greater than ${100}$. With different values of ${\beta}$ and letting run the algorithm for a few millions iterations (${5}$ min on my crappy laptop), one can easily produce configurations that are ${5}$-levels high: but the algorithm never found any real solution ${i.e}$ a configuration with height equal to ${6}$.

Now I am curious wether this is possible to produce a non-stupid energy function ${E(\cdot)}$ so that this riddle is solvable in a reasonable amount of time by standard MCMC – annealing methods.

As a conclusion, I should mention that with a pen and a cup of coffee, one can easily find a solution: I will not spoil the fun, but just say that the configuration space is not that big if one think more carefully about it…

## Potts model and Monte Carlo Slow Down

A simple model of interacting particles

The mean field Potts model is extremely simple: there are ${N}$ interacting particles ${x_1, \ldots, x_N}$ and each one of them can be in ${q}$ different states ${1,2, \ldots, q}$. Define the Hamiltonian

$\displaystyle H_N(x) = -\frac{1}{N} \sum_{i,j} \delta(x_i, x_j)$

where ${x=(x_1, \ldots, x_N)}$ and ${\delta}$ is the Kronecker symbol. The normalization ${\frac{1}{N}}$ ensures that the energy is an extensive quantity so that the mean energy per particle ${h_N(x) = \frac{1}{N} H_N(x)}$ does no degenerate to ${0}$ or ${+\infty}$ for large values of ${N}$. The sign minus is here to favorize configurations that have a lot of particles in the same state. The Boltzman distribution at inverse temperature ${\beta}$ on ${\{1, \ldots, q\}^N}$ is given by

$\displaystyle P_{N,\beta} = \frac{1}{Z_N(\beta)} e^{-\beta H_N(x)}$

where ${Z_N(\beta)}$ is a normalization constant. Notice that if we choose a configuration uniformly at random in ${\{1, \ldots, q\}^N}$, with overwhelming probability the ratio of particles in state ${k}$ will be close to ${\frac{1}{q}}$. Also it is obvious that if we define

$\displaystyle L^{(N)}_k(x) = \frac{1}{N} \, \Big( \textrm{Number of particles in state }k \Big)$

then ${L=(L^{(N)}_1, \ldots, L^{(N)}_q)}$ will be close to ${(\frac{1}{q}, \ldots, \frac{1}{q})}$ for a configuration taken uniformly at random. Stirling formula even says that the probability that ${L}$ is close to ${\nu = (\nu_1, \ldots, \nu_q)}$ is close to ${e^{-N \, R(\nu)}}$ where

$\displaystyle R(\nu) = \nu_1 \ln(q\nu_1) + \ldots + \nu_q \ln(q\nu_q).$

Indeed ${(\frac{1}{q}, \ldots, \frac{1}{q}) = \textrm{argmin} \, R(\nu)}$. The situation is quite different under the Boltzman distribution since it favorizes the configurations that have a lot of particles in the same state: this is because the Hamiltonian ${H_N(x)}$ is minimized for configurations that have all the particles in the same state. In short there is a competition between the entropy (there are a lot of configurations close to the ratio ${(\frac{1}{q}, \ldots, \frac{1}{q})}$) and the energy that favorizes the configurations where all the particles are in the same state.

With a little more work, one can show that there is a critical inverse temperature ${\beta_c}$ such that:

• for ${\beta < \beta_c}$ the entropy wins the battle: the most probable configurations are close to the ratio ${(\frac{1}{q}, \ldots, \frac{1}{q})}$
• for ${\beta > \beta_c}$ the energy effect shows up: there are ${q}$ most probable configurations that are the permutations of ${(a_{\beta},b_{\beta},b_{\beta}, \ldots, b_{\beta})}$ where ${a_{\beta}}$ and ${b_{\beta}}$ are computable quantities.

The point is that above ${\beta_c}$ the system has more than one stable equilibrium point. Maybe more important, if we compute the energy of these most probable states

$\displaystyle h(\beta) = \lim \frac{1}{N} H_N(\textrm{most probable state})$

then this function has a discontinuity at ${\beta=\beta_c}$. I will try to show in the weeks to come how this behaviour can dramatically slow down usual Monte-Carlo approach to the study of these kind of models.

Hugo Touchette has a very nice review of statistical physics that I like a lot and a good survey of the Potts model. Also T. Tao has a very nice exposition of related models. The blog of Georg von Hippel is dedicated to similar models on lattices, which are far more complex that this mean field approximation presented here.

MCMC Simulations

These is extremely easy to simulate this mean field Potts model since we only need to keep track of the ratio ${L=(L_1, \ldots, L_q)}$ to have an accurate picture of the system. For example, a typical Markov Chain Monte Carlo approach would run as follows:

• choose a particle ${x_i}$ uniformly at random in ${\{1,2, \ldots, N\}}$
• try to switch its value uniformly in ${\{1,2, \ldots, q\} \setminus \{x_i\}}$
• compute the Metropolis ratio
• update accordingly.

If we do that ${10^5}$ times for ${q=3}$ states at inverse temperature ${\beta=1.5}$ and for ${100}$ particles (which is fine since we only need to keep track of the ${3}$-dimensional ratio vector) and plot the result in barycentric coordinates we get a picture that looks like:

Here I started with a configuration where all the particles were in the same states i.e ratio vector equal to ${(1,0,0)}$. We can see that even with ${10^5}$ steps, the algorithm struggles to go from one most probable position ${(a,b,b)}$ to the other two ${(b,a,b)}$ and ${(b,b,a)}$ – in this simulation, one of the most probable state has even not been visited! Indeed, this approach was extremely naive, and this is quite interesting to try to come up with better algorithms. Btw, Christian Robert’s blog has tons of interesting stuffs related to MCMC and how to boost up the naive approach presented here.