Not until we go Markov

5 min readFeb 17, 2024

(The question to which the title is an answer, dear reader, is buried at the end of this essay….you will have to read on. And skipping directly to the bottom won’t help, since I haven’t gone Markov when writing this)

What humans know and do, whether as individuals or in groups, can be thought of by an observer as a “stochastic process”. This means that their knowledge and actions (I’ll call these “behaviours” as a crude shorthand) at any point in time are a function of behaviours at previous times, plus some randomness. This implies that future behaviour can be described by a probability distribution, which is, in turn, connected to past behaviours.

The randomness doesn’t mean that people have no reasons for their actions or that they behave in an arbitrary manner. Rather, the point is that there will be myriad reasons that the observer is not aware of, and perhaps even the human being observed could not consciously articulate (including things like volition), and these reasons, combined with past behavior, shape future behavior.

So far, I haven’t said anything particularly revolutionary or insightful. The assumption that human and group behaviour can be modelled as stochastic processes is pretty standard across the social sciences and is routinely used in the models we build. It essentially means that past behaviour is useful for predicting future behaviour, though it is not sufficient because of the random component.

Enter the Markov property. A stochastic process describing behaviour that has the Markov property is one in which only the current period’s data is necessary to predict the probability distribution of future behaviour — all the previous points in time that led up to the current period can be safely forgotten- they wont add much.

Put simply, it is a memory-less, history-free process. The present matters for predicting tomorrow (imperfectly, because of randomness), but everything from yesterday stretching back into the infinite past is irrelevant once we know what happened today, for the purpose of predicting what will happen tomorrow.

Let me give you an analogy. Suppose you are a new graduate student in molecular biology. As you prepare to write your dissertation, you will need to review a vast stack of prior literature to become up to date with what has been happening in the field to make a contribution. How far back will you go, though? You would be shocked if your advisor suggested, “Begin with Darwin and work your way forward!” Instead, you will likely start with some recently updated textbooks and any widely cited meta-analysis or review of previous literature that was published in the last decade or so and work your way forward from there.

Implicit in this strategy is the idea that once you know what is in that textbook and the meta-analyses, there is no need to go back and read all the prior detailed work anymore (though, in practice, you may have to revisit a few individual papers to ensure the meta-analysis represented them correctly, if for no other reason). But most certainly, you will not be starting with Darwin. The knowledge accumulation process in molecular biology is thus a stochastic process because what is known tomorrow is a function of what is known today, plus some randomness — the creative leaps, insights, and genius of individual researchers.

But crucially, it is also a stochastic process with the Markov property. This is why the grad student can successfully hope to reach the cutting edge of knowledge by reviewing everything forward from that highly regarded meta-analysis published no more than a decade ago.

In many parts of the social sciences, the norm is still for Ph.D. students to start by reading the classics. This indicates that knowledge accumulation in these fields, while stochastic, does not have the Markov property. This is a disadvantage for such fields because graduate students have finite time, and the body of research will only keep expanding. It will therefore get harder to cover the entire field starting from the founders and work your way up to the present as time progresses in these fields. In contrast, fields that have learned to encapsulate and summarize prior work as a foundation to build and move forward have attained the Markov property. It is sometimes said we can guess the rate of progress in a field by the rapidity with which it forgets its founders. By this metric, we’re not making that much progress in my neck of the woods…

But what’s bad for scientific progress may be good for humans.

Now, when we write text, draw images, or make videos that get absorbed into the vast maw of large language models, those inputs we produce represent an end state that is the result of a long stochastic process reaching far back into the past. Not just of our individual life histories but perhaps of entire civilizations, and arguably, of the entire species. For artificial intelligence algorithms to be able to work with the data we produce and effectively forecast us — to the extent that randomness allows — requires that data to be sufficient, and knowing the history that led up to it to be unnecessary.

In other words, for the training data we feed into AI to be sufficient for AI to predict our behaviours with high accuracy, it must be true that the stochastic processes that represent us have the Markov property. But it’s highly unlikely to have it, as neither the vast quantities of data that flow into our minds since childhood nor the sequence and history in which they do so are likely to be captured in what we produce as text and images at some points in our lives.

I’ve heard people say that Chat GPT represents a sort of “collective consciousness of the human species”. By the argument above, that claim is false, since all it has is data from the internet (vast as that is).

And so, the question: as AI algorithms gather ever more data from our behaviours and draw on ever more computation, will they be able to predict us with high levels of accuracy?

Not until we go Markov

Written by Phanish Puranam

No responses yet