Analysis: Big data may predict where —but not when —conflict happens

December 19, 2019

By Sam Ratner

A person wearing a military uniform stands on the street next to a palm tree — An Australian soldier patrols in front of shops that were burned by rioters at the Comoro market in Dili, East Timor, May 31, 2006.

Data can tell us a lot about things that have already happened. As our increasingly digital world produces an ever-more complete record of world happenings large and small, our ability to roll back the clock and see exactly how things played out has never been greater. But, like, who cares about yesterday’s news, man?

For a certain type of policymaker, bent on throwing off the shackles of chronology, the promise of big data is not that it can elucidate the past but that it can predict the future.

Security is as prone to this tendency as any other field, and a rash of research has made attempts to use new data harvests to predict the outbreak of security challenges.

Recently, a group of economists and political scientists released a working paper in which they applied a series of proposed statistical conflict prediction methods for two cases: Colombia from 1988 to 2005 and Indonesia from 1998 to 2014.

If machine-learning models could predict the location and timing of conflict within those periods — authors Samuel Bazzi, Robert Blair, Christopher Blattman, Oeindrila Dube, Matthew Gudgeon, and Richard Merton Peck reasoned — the algorithms that produced them might be useful in predicting future conflicts.

The results will confound “the time haters.”

It turns out that the models are pretty good at predicting where violence will take place, but not very good at saying when it will happen. There are certain “hot spots” for violence that data can identify, but predicting when those hot spots will erupt remains elusive.

Even when data is disaggregated to the year — that is, when the model is asked whether violence will increase in a certain calendar year over a previous year — the models explained none of the year-to-year variations in violence levels in particular locations. 

The authors offer some possible explanations for why it is so hard for machine-learning models to predict conflict timing. One idea is Fearon’s 1995 defense: If war is irrational as a method of conflict resolution, how can we use logic-based models to predict it?

Another suggestion is that conflict actors sometimes make countercyclical decisions for strategic reasons — both sides in a conflict might hold off attacking in a certain location at a certain time if they believe the other side is likely to do the same.

One argument the authors largely dismiss, however, is the idea that more data will solve the problem. The Indonesia and Colombia data sets are substantial, and the models are equally bad at predicting conflict timing if you feed them just the first couple years of the data or all of it.

If you start measuring conflict events over, say, thirty years, you’re at least as likely to be conflating fundamentally different parts of a conflict than you are to be making valid discoveries in new data.

This analysis was featured in Critical State, a weekly newsletter from The World and Inkstick Media. Critical State is your weekly fix of foreign policy without all the stuff you don’t need. It’s top news and accessible analysis for those who want an inside take without all the insider bs. Subscribe here.