SMC sampler

Sequential Monte Carlo (SMC) methods are usually designed to deal with probability distributions of increasing dimensions. The canonical example is the sequence of distributions {\pi_n(X_{1:n}) = \mathbb{P}(X_{1:n} \, | Y_{1:n})} where {\{ X_k \}_{k \geq 0}} is a hidden Markov chain and {\{ Y_k \}_{k \geq 0}} is an observation process. The notation {X_{1:n}} is a shortcut for {(X_1, X_2, \ldots, X_n)}. This is usually called a Hidden Markov Model in the literature. If the Markov chain {X_k} evolves on in {\mathbb{R}^d} then the distribution {\pi_n} is a probability on {\big( \mathbb{R}^d \big)^n}.

It turns out that one can use the same kind of ideas to study a single distribution {\pi}. If {\pi} is difficult to sample directly from and that it is difficult to use traditional MCMC methods (because of the dimensionality, multi-modality, poor mixing properties, etc…), one can try to use SMC methods to study the distribution {\pi}. To this end, one can introduce a sequence of bridging distributions

\displaystyle \begin{array}{rcl} \pi_0, \pi_1, \ldots, \pi_n \end{array}

where {\pi_0} is typically very easy to sample from and {\pi_n=\pi}. A traditional choice is {\pi_k(x) \propto \pi(x)^{\alpha_k}} where {0 \leq \alpha_1 < \alpha_2 < \ldots <\alpha_n=1}. The bigger the index {k}, the more intricate the distribution {\pi_k} is. The problem is that each distribution {\pi_k} has the same dimensionality as the original distribution {\pi} so that it is difficult to use all the machinery of SMC methods. For the sake of concreteness, let us suppose that {\pi} is a distribution on {\mathbb{R}^d}. A fundamental remark that is the base of what is now known as Sequential Monte Carlo samplers is that {\pi_k(x_k)} can be seen as the marginal distribution of the probability

\displaystyle \begin{array}{rcl} \Pi_k(x_1, \ldots, x_k) = \pi_k(x_k) L_k(x_k, x_{k-1})L_{k-1}(x_{k-1}, x_{k-2}) \ldots L_2(x_2,x_1) \end{array}

where {L_j(\cdot, \cdot)} are Markov kernels in the sense that for any {x} we have {\int_y L_j(x, y) \, dy = 1}. The distribution {\Pi_k} lives on {\big( \mathbb{R}^d \big)^k} and one can use the traditional SMC methods to explore the sequence of distributions {\Pi_1, \ldots, \Pi_n}. This algorithm was devised a few years ago by Pierre Del Moral, Arnaud Doucet and Ajay Jasra. It is extremely powerful and several other algorithms can be seen as particular cases of it. Because I had never implemented the algorithm before, I tried yesterday to see how this worked on a very simple example. My target distribution was an equal mixture of seven Gaussian distributions in {\mathbb{R}^2} with unit covariance matrices and centred at the vertices of a regular heptagon with radius {20}. I started with {N=3000} particles at the centre of the heptagon and let them evolved according to a random walk. When the Effective Sampling Size (ESS) was too small (less than {N/2}), the whole particle system was regenerated. In order to obtain a sweet movie, I used {100} bridge distributions {\pi_k \propto \pi^{k/100}}.

The result is clear and quite impressive. Indeed, this could have been made much more efficient by taking much less bridge distributions. Some recent theory has been developed to analyse the stability of the algorithm in high dimensional situations. The analysis described in the article uses a scaling limit approach very similar to some ideas used to analyse MCMC algorithms and some more recent developements along the same lines that we have done with Andrew and Natesh. Very exciting stuff!

[EDIT: 2 October 2012]
I received several emails asking me to describe how to obtain the short animation showing the evolution of the particles system. The code is available here. This is a quick python script producing a bunch of .png files. It then suffices to stick them together to obtain an animated .gif file. This can for example easily be done using the free (and very good) photoshop-like software GIMP. Open all the png files as different layers. Export the result as a .gif file with the option ‘animated gif’ checked. Voila!

Advertisements

An army of gamblers and a martingale

coin toss
Start flipping a coin. How long should you wait (on average) before observing the pattern {HHTHHTT} !? Easy you might say, that’s simply the hitting time of the state {H} starting from state {A} of the following Markov chain.

Indeed, one can solve this kind of problem using a first step analysis. In other words, if {a} is the expected number of steps before reaching {H} starting from {A}, {b} is the expected number of steps before reaching {H} starting from {B}, {c} is the expected number of steps before reaching {H} starting from {C}, etc… one can readily see that {a,b, \ldots, h} verify the system of equations

\displaystyle \begin{array}{rcl} a &=& 1 + \frac12(a+b), \quad b = 1 + \frac12(a+c), \quad c = 1 + \frac12(c+d) \\ d &=& 1 + \frac12(a+e), \quad e = 1 + \frac12(a+f), \quad f = 1 + \frac12(c+g) \\ g &=& 1 + \frac12(e+h), \quad h = 0. \end{array}

After a finite amount of time and coffee one then come up with the answer {a=128}. That was a little bit tedious and I don’t really want to know how long one should wait on average before observing the sequence {HHTHTTTHTHHHHTTHTHTTH}.

It turns out that there is a very neat way to approach this kind of problems without solving any linear system. Apparently, this trick was discovered in the 80s by Shuo-Yen Robert Li. As often in probability, this boils down to introducing a clever martingale. Suppose that one wants to know how long it takes on average before observing the magic coin pattern {x_1x_2 \ldots x_n} of length {n \geq 1} with {x_i \in \{H,T\}} for all {1 \leq i \leq n}. To this end, one will introduce an army of gamblers, a casino, a game of chance and a martingale. The martingale {M} describes the amount of money earned by the casino. At each round (starting at round {t=1}) the casino tosses a coin and the gamblers can make bets. If a gambler starts betting just before round {k}, he wins if the coin shows {x_1}. If he decides to bet again at round {k+1}, he wins if the coin shows {x_2}, etc… Suppose that there are infinitely many gamblers {G_0, G_1, G_2, \ldots} and that gambler {G_k} starts betting just before round {k+1}. The game is fair in the sense that if a gambler bets an amount {A}, with probability {1/2} he gets {2A} and with probability {1/2} he losses his bet. One supposes that gambler {G_k} starts betting {1} dollar just before round {k+1} and keeps reinvesting his total capital until he either losses everything (total loss equals {1} dollar) or observes the magic coin pattern {x_1x_2 \ldots x_n} (total earning equals {2^n-1}). Indeed, the amount of money earned by the casino is a martingale and {M_0=0}. For example, {M_1=1} if the first gambler {G_0} loses his bet and {M_1=-1} if the wins. Consider the time {\tau} when the magic coin pattern first appears. This is a stopping time and the optional stopping theorem for martingales shows that {\mathbb{E}[M_{\tau}] = 0}. Indeed, one can also compute this quantity another way. By time {\tau}, all the losers have given {1} dollar to the casino. Also, if the gambler {G_k} is still in the game he has won a total amount {2^{\tau-k}-1}. in other words,

\displaystyle M_{\tau} = \tau - \sum_{j=0}^{\tau-1} 2^{\tau-k} \, \delta_j

where {\delta_j=1} if the gambler {G_j} is still in the game after the round {\tau} and {\delta_j=0} otherwise. The fundamental remark is that the quantity {\sum_{j=1}^{\tau} 2^{\tau-k+1} \, \delta_j} is not random so that {\mathbb{E}[\tau] = \sum_{j=0}^{\tau-1} 2^{\tau-k} \, \delta_j}. Indeed, right after round {\tau} one can know with certainty whether {G_{k}} is still in the game or not. Gambler {G_{\tau-k}} is in the game only if the {k} last letters of the magic word {x_1x_2 \ldots x_n} are the same as the first {k} letters. For example, if the magic word is {HHTHHTH} then the gamblers who have seen the pattern {H},{HHTH} and {HHTHHTH} at time {\tau} are still in the game when the magic word first appear. In this case this shows that {\mathbb{E}[\tau] = 2^1+2^4+2^7}.

gambler

For example, this immediately shows that we have to wait on average {2^{21}+2^1} coin tosses before observing the pattern {HHTHTTTHTHHHHTTHTHTTH}.