I dug up my old (mid-2007) DNA Birthday Paradox calculations following a remark in the article comments, and to summarise figures produced using formulas from http://en.wikipedia.org/wiki/Birthday_paradox
With 'one in a billion', a database of
15174 profiles would give you a 99.999% chance of finding two profiles in it that coincidentally match
225070 profiles would give you a 99.999999999% chance of finding two profiles in it that coincidentally match
though this is not the same as trying to find a match given a known profile to match against.
With an initial profile and a database of 50 million to search against, there is a 5% probability that the match found is wrong. Assumes : totally random with even distribution.
Take those up to the national DNA database of 5 million profiles, and on the possibly big assumption that the formulas scale up, you are pretty much guaranteed that there are going to be a whole stack of coincidental matches. We are told that there is an estimated rate of around 13% "replicates" which are apparently profiles derived from multiple samples given by the same people but under different names after being arrested more than once.
The question arises as to how many of these "replicates" are in fact coincidental matches from different people whose profiles just happen to fit in the same numerical 'box'. Taking something very complicated and reducing it down to a limited set of numbers is always going to have the possibility of 'collisions', as with hashing functions.
A closer inspection and taking additional information into account would normally sort out any confusion, though this hasn't always been done properly in the past.
I don't suppose anyone fancies doing the numbers themselves to corroborate or refute my figures...? Just because nobody has yet argued with these doesn't guarantee correctness...