Machine learning for better homicide counts in Ciudad Juarez

on Jul 30, 2012
Photo Credit: Jesús Villaseca Pérez
Ever since March 2008 Ciudad Juárez began to register an alarming number of homicides becoming Mexico's most violent city. According to the Mexican vital statistics system Ciudad Juárez (coterminous with the Juárez municipality) went from having just 202 murders in 2007 to 1,616 in 2008, 2,397 in 2009, and 3,686 in 2010.

Mexican and US officials explain the dramatic increase in violence as due to a conflict between the Sinaloa and Juárez Cartels. After a new governor was elected in October 2010 Ciudad Juárez does seem to have started turning around, but it is still an extremely violent city.

Poll of polls including the 'quick count'

on Jul 2, 2012
As expected Peña Nieto won and AMLO came in second place, but the polls way overestimated the voting intention of Peña Nieto, underestimated AMLO's expected vote, and to a lesser extent also underestimated Josefina Vázquez Mota's vote.

The Federal Elections Board in Mexico runs a "quick count" (conteo rápido), a random sample of returns from voting booths across the country, which serves as a highly accurate exit poll. Measuring the euclidian distance from the normalized "quick count" to the voting preferences, the most accurate pollsters were:  SDP Noticias-Covarrubias, Grupo Reforma, Ipsos-Bimsa, and UNO TV-María de las Hera. The worst performing pollsters were Milenio-GEA ISA and Indemerc.

Rerunning my polls of polls using the quick count results as if they were the result of a poll taken on election day with a massive sample size and free of pollster bias (I plan on rerunning this analysis when the final election results are known) I get the chart at the top the post. The results are quite striking, particularly with regard to Pena Nieto, and indicate a relatively big bias among most pollsters. I'm very curious about what it is the polling firms did wrong to oversample PRI voters or otherwise bias their results.

There's always the possibility that AMLO had a late surge, but it doesn't look like there was one since some of the private exit polls also tended to put Peña Nieto ahead of what the "quick count" predicted. And exit polls are usually much more accurate than polling. It's very likely there was bias.


Simon Jackman. Bayesian Analysis for the Social Sciences. John Wiley & Sons, 2009