In the “never too late to learn that” section…

As a number theorist interested in probability, I have accumulated my knowledge of that field (such as it is…) in a rather haphazard fashion, as is probably (pun intended) typical. The gaps in my understanding are wide and not really far between, and I can’t generally expect to take advantage of teaching a class to fill them (we have very good probabilists in Zürich), though I did teach introductory probability two years in Bordeaux. So it’s not so surprising that I stumbled this morning in my lecture on expanders when defining a Markov chain (which deserves some “tsk, tsk” from my ancestors, since my mathematical genealogy does trace back to Markov…)

More precisely, given a graph $latex \Gamma$, say $latex k$-regular, I defined a random walk on $latex \Gamma$ as a sequence $latex (X_n)$ of random variables taking values in the vertex set $latex V$, and which satisfy the conditional probability identity
$latex \mathbf{P}(X_{n+1}=y\mid X_n=x)=\frac{a(x,y)}{k}$
for all $latex n\geq 0$, and all vertices $latex x, y$, where $latex a(x,y)$ is the number of edges (unoriented, oeuf corse) between $latex x$ and $latex y$.

But, as one of the CS people attending the lecture pointed out, this is wrong! The definition of a Markov chain requires that be able to condition further back in the past: for any choices of $latex x_0, x_1,\ldots, x_n$, one asks that
$latex \mathbf{P}(X_{n+1}=y\mid X_0=x_0,\ldots, X_n=x_n)=\mathbf{P}(X_{n+1}=y\mid X_n=x)=\frac{a(x,y)}{k}$
which corresponds to the intuition that the behavior at time $latex n$ be independent of the part trajectory of the walk, except of course for the actual value at time $latex n$.

What puzzled me no end was that I had failed to notice the problem, simply because this more general condition did not play any role in what I was doing, namely in proving the convergence to equilibrium of the random walk! The difference, which was clarified by my colleague M. Einsiedler, is that although the “wrong” definition allows you to compute uniquely the distribution of each step $latex X_n$ as a function of the initial distribution $latex X_0$, and of the transition probabilities $latex a(x,y)/k$, one needs the actual Markov condition in order to compute the joint distribution of the whole process $latex (X_n)$, i.e., in order to express such probabilities as
$latex \mathbf{P}(X_n=x_n,\ldots,X_n=x_0)$
as functions of the distribution of $latex X_0$ and the transition probabilities.

(My confusion arose partly from the fact that, up to this class, I exclusively used random walks on groups as Markov chains, defined using independent steps, and these are automatically Markovian; I’ve actually exploited the “strong” Markov property in a recent paper with F. Jouve and D. Zywina, but in that specialized context, the point above — I’m not sure if it deserves to be called “subtle!” — didn’t register…)

Published by

Kowalski

I am a professor of mathematics at ETH Zürich since 2008.