Recently, Yann Ollivier developed a nice theory of Ricci curvature for Markov chains. In many ways, this can be seen as a geometric language giving another view on the notion of path coupling, developed at the end of the ‘s by Martin Dyer and co-workers. It has to be noted that this new notion of curvature is very general and does not need the state space where the Markov chain evolves to have any differential structure, as can be expected at first sight. Any state space endowed with a metric suffices.
Let be a Markov kernel on a metric state space
. We would like to quantify how long it takes for two different particles evolving according to the Markovian dynamic given by
to meet. If the first particle starts at
and the second at
, the initial distance between them is
. At time
, what is the average distance between these two particles. For example, if
and
are two Brownian motions in
started from
and
respectively, there is no reason why
and
should be closer from each other than
and
. Indeed, one can even show that whatever the coupling of these two Brownian motions we have
: this is roughly speaking because the Euclidean space
has no curvature. The situation is quite different if we were instead considering Brownian motions on a sphere: in this case, trajectories tend to coalesce.
1. Wasserstein distance
In the sequel, we will need to use a notion of distance between probability distributions on the metric space . The usual total variation distance
defined by
is not adapted to our purpose since the metric structure of the space is not exploited. Instead, in order to take into account the distance of the space
and develop a notion of curvature, we use the Wasserstein distance
between probability measures. It is defined as
The distance is crucial to this definition: a change of distance implies a change of the class of
-Lipschitz functions. Since
for any coupling
of
and
, and since the function
is
-Lipschitz, it follows that
. Consequently, for any coupling
we have
. Taking the infimum over all the couplings
leads to the inequality
This is a deep result that on any reasonable space the inequality is in fact an equality. Indeed, Kantorovich duality states that on any Radon space
we have
It is interesting to note that under mild conditions on the state space one can always find a coupling that achieves the infimum of (4): this is an easy compactness argument.
2. Notion of Curvature
Denoting by the one step distribution of the Markov chain started from
in the sense that
, we define the local (Ricci) curvature
between
and
as
The closer to is
, the more the trajectories started at
tend to meet the trajectories started at
.
The interesting case is when the infimum is strictly positive,
In this case we say that the Markov kernel is positively curved on
. It should be noted that in many natural spaces it suffices to ensure that
for all neighbouring states
and
to ensure that
for any pair
. This can be proved thanks to the so called Gluing Lemma. A space without curvature correspond to the case
: for example, a symmetric random walk on
and a Brownian motion on
have both zero curvature. The curvature
is a property of both the metric space
and the Markov kernel
: indeed, different Markov chain on the same metric space
have generally different associated curvature. Given a metric space
carrying a probability distribution
, this is an interesting problem to construct a
-invariant Markov chain with the highest possible curvature
.
Indeed, the notion of curvature readily generalizes to continuous time Markov processes by taking a limiting case of (5). For example, one can define the curvature of the continuous time Markov process as the largest real number
such that for any
and
we have
for every small enough. The quantity
is the distribution of
when started from
in the sense that
.
3. Contraction property
We now show that a positive curvature implies a contraction property. Equation (5) shows that for any
. A simple argument shows that one can indeed generalize the situation to any two distributions
in the sense that
Proof: For any pair consider a coupling
of
and
such that
. Now, choose an optimal coupling
of
and
. This is straightforward to check that
is a coupling (in general not optimal) of
and
so that
Equation (8) is extremely powerful since it immediately shows that
In other words, there is exponential convergence (in the Wasserstein metric) to the invariance distribution at rate
. In continuous time, this reads
In other words, the higher the curvature, the faster the convergence to equilibrium.
4. Examples
Let us give examples of positively curved Markov chains.
- Langevin diffusion with convex potential: consider a convex potential
that is uniformly elliptic in the sense
. The Langevin diffusion
has invariant distribution
with density proportional to
. Given a time step
, the Euler discretization of this diffusion reads
where
. Given two starting points
and
, using the same noise
to define
and
it immediately follows that
In other words, the Langevin diffusion
is positively curved with curvature (at least) equal to
.
- Brownian motion on a sphere: consider a Brownian motion on the unit sphere of
. Consider two points
on this unit sphere: by symmetry, one can always rotate the coordinates so that that
and
for some
. For
the (geodesic) distance
is approximated by
. One can couple two Brownian motions
and
, one started at
and the other one started at
, by the usual symmetry with respect to the plane
: in other words,
is the reflexion of
with respect to
. One can check (good exercise!) that the diffusion followed by the
-coordinate of a Brownian motion on the unit sphere of
is simply given by
With this coupling, for small time
, it follows that
where
is used as the same source of randomness for
and
since
is the reflexion of
. Since
it readily follows that
In other words, the curvature of a Brownian motion on the unit sphere of
is equal to
. Maybe surprisingly, the higher the dimension, the faster the convergence to equilibrium. This is not so unreal if one notices that the Brownian increment satisfies
.
- Other examples: see the original text for many other examples.
Random Collection « Honglang Wang's Blog said,
March 22, 2011 at 1:38 pm
[…] Curvature for Markov Chains […]
Shiping Liu said,
March 22, 2011 at 8:44 pm
I think there is a typo in the line below equation (1), “the metric structure or the space”
should be “the metric structure of the space”.
In the second line of Section 2, Should the $\kappa(x, y)$ be the analogue of Ricci curvature not the sectional curvature?
Alekk said,
March 23, 2011 at 9:21 am
Thanks [Corrected].
Weekly picks « Mathblogging.org — the Blog said,
March 30, 2011 at 12:56 pm
[…] Peter Cameron’s post on ambiguity, part of an exchange with JoAnne Growney after which Journey into Randomness introduced you to curvature for Markov chains (only the researchblogging snippet is missing). Come Tuesday, Doug Corey’s guest post at […]
Alex Gittens said,
August 4, 2011 at 7:46 pm
This machinery looks like a generalization of the tools used in Oliveira’s paper on the convergence of Kac’s walk on the orthonormal group (available on arXiv and referenced in Olliver’s paper). I recommend Oliveira’s paper to anyone who wants to see these ideas in action but isn’t very comfortable with differential geometry. Thanks for pointing this paper out: I’ve been looking at the convergence of Kac’s walk on the sphere, and my earlier attempts to generalize Oliveira’s methods failed to provide meaningful mixing times, so maybe this will help.
Some Interesting Probability Examples « Honglang Wang's Blog said,
April 19, 2012 at 3:54 am
[…] Curvature for Markov Chains (it can help to better understand MC) Share this:PrintFacebookEmailTwitterLinkedInDiggRedditStumbleUponLike this:LikeBe the first to like this post. […]