Strengths and weaknesses of crime data in Mexico

on Feb 10, 2011
With so much data pertaining to the drug war released recently it's hard to keep track of it all. And as with all things in life there are different pros and cons associated with each of the datasets: The homicide data from the police (SNSP), the homicide data from the vital statistics (INEGI), and the different estimates of drug war related deaths from Reforma, Milenio, and the database of homicides presumed to have been committed by organized crime.

Homicide data from the SNSP

This data is based on police reports. The homicide rates calculated with it used to be higher until 2008 when they suddenly started being lower.
Cons
  • The numbers in the database correspond to the number of police reports ("averiguaciones previas") for the crime of homicide, not to he number of dead bodies. The reports may contain than one victim and furthermore they may be repeated.
  • In 2008 the SNSP gave incomplete data to the ICESI for the state of Chihuahua, then when the data for 2009 was released, they updated the 2008 data and again gave incomplete data for Chihuahua. It's understandable that there would be some delay given the incredible rise in homicides, but it's kind of fishy to forget to mention how incomplete it was. Sadly, the incomplete data was used for the homicide rates calculated by the UN.
  • In the state of Tamaulipas, during August 2010, Mexican marines found the dead bodies of 72 persons inside a ranch. The victims were immigrants from Central and South America, presumably killed by the Zetas. Yet during the month of August there were less than 70 homicides in all of Tamaulipas.
  • The State of Mexico had 3 months without homicides at the end of 1998
  • Starting January 2007 the number of homicides in the state of Mexico dropped by half from one month to the next.
  • In 1997, Yucatán, Aguascalientes, and Querétaro had incredibly high homicide rates, there’s probably an error in database. 
  • The number of homicides in Tlaxcala before 2007 seem way to high.
  • According to the database there were no homicides in Tlaxcala during 2007. However the General Secretary of the state verbally reported to the ICESI that there were 42 homicides in 2007. Also in Tlaxcala, during 2006, there were an anomalously high number of kidnappings, probably the result of another error unless the smallest state in Mexico accounted for over 40% of the kidnappings in the entire country.
  • There's no data on homicides by firearm in the states of Baja California, Oaxaca and Tabasco, and it fluctuates widly in Guererro, and Jalisco.
  • The proportion of homicides by firearm was incredibly low in Chihuahua in 2008. That was the year of the joint operation in Chihuahua and it doesn't match the data from the INEGI. Basically the firearm data from the SNSP is useless unless you make lots of adjustments to it.
  • It looks like the different states send a bunch of excel files to the SNSP which then tallies them. With such an outdated way of collecting data it's no surprise the database is plagued with mistakes.
Pros
  • It is constantly updated and the data is only a couple of months out of date. Although the SNSP hasn't updated it's online download tool since September they have updated the pdfs with the crime data. (Here's a scrapper written in python to extract homicide data from the pdfs)

Homicide data from the INEGI

This data is based on death certificates compiled by the Mexican government.

Cons
  • The deaths of the Acteal Massacre in 1997 were registered as accidents instead of homicides, and certified by a forensic doctor ("medico legista") in Tuxtla Gutiérrez to boot. However, I did check some of the recent massacres and they were all in the database.
  • It takes a while for all the death certificates to be tallied and the data is usually more than a year out of date
  • The cutoff date of December 31 means the homicides for the last available year are under-counted by 4% (25% for the last month of the year)
  • There's a weird pattern of lesions of undetermined intent in Ciudad Juárez in 2007
  • In 2008 most newspapers reported the number of deaths in Ciudad Juárez as 1650, the number in database is 1610, the discrepancy is probably because newspapers tend to report not only the number homicides in Juárez but also in the adjacent municipalities. In 2009 there were 2,316 homicides in the database (about 2,375 taking into account the undercount), but press reports placed the number of homicides as close to 2,700.
Pros
  • This is the best record of homicides in Mexico. If you download the mortality database from SINAIS you'll get a daily record for all deaths in Mexico at the locality and municipality levels. You can also find out how many people died from firearms, poisoning, knife wounds, etc.

Execution tallies by the newspapers Milenio and Reforma 

Cons
  • Given the low prosecution rates it is not surprising that the series between Reforma and Milenio differ. However the difference between the series should be random and starting in June 2009 Reforma shows a precipitous drop only to go back up again. Given that Chihuahua accounted for most of the drop it looks like Reforma missed some narco-executions.
    I am thankful for Reforma's steadfast devotion to the task of tallying the homicides week by week, it matters so that people know what is happening and the government can't hide the magnitude of this tragedy. I've used their data before, but something went wrong in the state of Chihuahua in 2009.
  • Milenio went from counting more drug war related homicides than the government to fewer.
 
Pros
  • Constantly updated
  • Data from Reforma is available by week and at the state level

Crimes presumed to be linked with organized crime (drug war related murders)

This data is based on police reports, but filtered to only include those deaths presumed to be linked with organized crime (drug cartels) and exclude duplicate reports.

Cons
  • The government missed the incredible rise in murders in Ciudad Juárez right before the army arrived. There's also a discrepancy with the Baja California homicide data.
  • The government uses a definition of what constitutes an execution at odds with what the newspapers define to be an execution (all drug war related deaths)
  • Since the database contains the number of deaths and not police reports like the SNSP homicide data, we can compare it to the INEGI. A higher number of drug-related homicides than total homicides would indicate a serious problem with at least one of the databases:

    There's a big discrepancy in the state of Sinaloa. The differences in other states are small and may be due to the deaths being ordered by date of registration instead of occurrence. I also only compared by state since it is unclear in the drug related homicide datase whether the municipality refers to the place where the murder took place or where it was registered.
  • If you add the values for all the aggressions, shootouts, and executions and compare them with the precomputed totals in the database, the values for April 2009 (Acaponeta) and October 2010 (Manzanillo) are off by one, which speaks volumes about the care with which it was compiled.
  • We can also compare the drug-related database to the SNSP homicide data. As I suspected the number of homicides in Chihuahua and Baja California was underreported:

Pros
  • The database is divided into executions, shootouts, and aggressions against the government. Although the definitions seem somewhat dubious: Shootouts can start as aggressions. And any homicide by firearm may be counted as an execution.
  • Contains recent data

Since I've suspect that homicides by firearm account for a big chunk of drug-related homicides I decided to compare the drug-related homicides in the most violent states (excluding Sinaloa because of the discrepancy) with the data from the INEGI:
The relationships between firearm homicides and drug-related homicides look linear...
except for Baja California, but that will be the topic of my next post...


Don't miss a thing: Follow me @diegovalle on Twitter, grab the RSS feed or subscribe by email

2 comments:

ben said...

Ever since I stumbled on this blog last summer I've been meaning to leave a comment to show my appreciation.
Your posts are fine examples of the type of careful and curious work that any statistician, researcher or journalist should aspire too and as far as I know you are doing this "pro bono" so I think you deserve some props. I especially love that you often share the code.

Great job, thanks!

Diego said...

Thanks!

Post a Comment

Thank you for your interesting and insightful comments. I truly look forward to reading them.