This spring, the FBI expanded its practice of warehousing DNA sample data, expanding from samples of criminal convicts to samples from people who have been arrested but not convicted of a crime. As a result, the FBI has been adding new samples to its database (which goes by the names of CODIS and NDIS) at an accelerated rate of nearly 1.7 million a year. The number of entries in the FBI’s database has grown so quickly that the FBI has indicated the need for new software and more money to handle it all:
Through the combination of increased Federal funding and expanded database laws, such as the DNA Fingerprint Act of 2005, the number of profiles in NDIS has and will continue to dramatically increase resulting in a need to re-architect the CODIS software.
The National DNA Index System (NDIS) functions by having states send in DNA samples according to the standards set by their own laws and regulations. Publicly available statistics on the number of samples filed by state (as of September 2009) show considerable variability in the extent to which states contribute samples. What accounts for this variation?
Taking state population size into account
Part of that variability is due to variation in state population; it is reasonable to expect that large states such as Texas would contribute more DNA samples than South Dakota simply because Texas has more people in it than South Dakota does. To account for variability in population from state to state, let’s look at a ratio instead:
(State’s Share of all DNA Samples sent in / State’s Share of U.S. Population)
When this ratio is larger than 1, that means a state sends in a larger share of DNA samples than the share of the its population. Such is the case for Wyoming. As of September 2009, the state of Wyoming has sent in a small number of DNA samples to be included in the CODIS-NDIS database, just 13,202 (0.185% of all DNA samples entered into CODIS-NDIS). But the population of Wyoming is also small, just 522,830 (0.175% of the U.S. population). The resulting ratio of the two numbers, 1.056, indicates that Wyoming actually sends in a disproportionately large number of samples to be included in the FBI’s database — about 5.6% more than we’d expect by chance alone.
When the ratio is smaller than 1, on the other hand, that means a state is sending in a smaller share of DNA samples for the FBI’s database than we’d expect by chance alone. The sheer number of DNA samples sent in from Texas — 469,137 — dwarfs the number sent in by Wyoming. But 23,904,380 people live in Texas, many more than live in Wyoming. The ratio of Texas samples to Texas population (0.8217) indicates that Texas actually sends in a disproportionately small number of DNA samples to the federal government — only 82.17% of what we’d expect by chance alone.
The map below shows us which states send in disproportionately small numbers of DNA samples to the FBI’s database (blue and green states), and which states send in disproportionately large numbers of DNA samples:
There is only one area of clear consistency on this map, and that is the Northeast. From Maryland and Delaware clear through to Maine, the states of the Northeastern United States send a disproportionately small number of DNA samples on to the federal government for storage and use in the FBI’s database. Elsewhere in the country, there are no clear regional patterns. Some southern states send disproportionately large numbers of samples on to the FBI — Virginia, the Carolinas, Mississippi and Florida — but other southern states don’t. Mountain, Western and Midwestern states are also mixed. This is not a simple story that can be mapped onto one contrast: “red” states vs. “blue” states, populated states versus small states, coastal states versus inland states.
Do State Laws for Archiving Arrestees’ DNA Make the Difference?
What’s going on here? One possibility is that the shift in numbers reflects differences in state law. As of 2008, 13 states out of the 50 (with the addition of Maryland during that year) allowed arrestees’ DNA to be included in the FBI’s database. They are:
But comparing those names to the map of over- and under-representation in the DNA database again shows a lack of a consistent pattern: states like Kansas, Louisiana, Maryland, North Dakota, Tennessee, Minnesota and Texas have more expansive DNA collection laws but less representation in the federal DNA database than we’d expect by chance.
The Impact of Crime
An additional factor that we haven’t considered yet is variation in reported crime. Where there are more crimes, there should be more convicts whose DNA is to be sampled and, in some states, more arrestees to be swabbed for the database as well. We’ll look at the FBI’s most recent Uniform Crime Report data for 2007 to get an idea of the crime rate for each state. Although the UCR has been criticized for only measuring crimes reported to the police, that methodological issue actually works to our favor here, since it is crimes reported to the authorities for investigation, arrest and prosecution that would lead to inclusion in the FBI’s database.
Below is a scatterplot in which each state is represented as a blue dot placed according to the state’s UCR violent crime rate (along the X-axis) and the DNA sampling ratio we discussed above (along the Y-axis):
The black line in the graphic above represents the best-fit line describing the overall pattern in the data: every additional increase of 100 units in the violent crime rate for a state is associated with an increase of 0.08 units in our ratio describing how many DNA samples a state sends on to the FBI. States with more violent crime do indeed tend to send more DNA samples on to the FBI.
But that’s only a tendency, and as the R-squared statistic associated with this tendency tells us, the violent crime rate succeeds in predicting only 11.98% of the variation in DNA samples sent to the FBI. A great deal of variation remains, and some states with very high crime rates send on a whole lot fewer DNA samples than some states with very low crime rates do. Beyond the two red parallel lines I’ve drawn are some significant outliers, states that don’t behave as you’d expect if you thought that violent crime was the only factor driving DNA sampling.
The following are states that send on a significantly higher-than-predicted number of DNA samples to the FBI, even after taking population and violent crime into account:
The following are states that send on a significantly lower-than-predicted number of DNA samples to the FBI, even after taking population and violent crime into account:
Can you find any pattern in that mix of states? I sure can’t.
Population levels in different states have a trivial effect on the amount of DNA sampling going on there, and the violent crime rate has a non-trivial but non-overwhelming effect upon the practice. Beyond this, explanations for the rather large variation in the practice of DNA data warehousing remain elusive to me.
What do you think’s going on here?