lundi 20 avril 2015

Using ABS(Checksum(NewId())) % n as a random ID pointer returns intermittent null results, and intermittent multiple results

I'm building my first de-identification script, and running into issues with my approach.

I have a table ("dbo.pseudonyms") whose "firstname" column is populated with 200 rows of data. Every row in this column of 200 rows has a value (none are null). This table also has an "id" column (int, primary key, not null) with the numbers 1-200.

What I want to do is, in one statement, re-populate my entire USERS table with firstname data randomly selected for each row from my "pseudonyms" table.

To generate the random number for picking I'm using ABS(Checksum(NewId())) % 200. Every time I do SELECT ABS(Checksum(NewId())) % 200 I get a numeric value in the range I'm looking for just fine, no intermittently erratic behavior.

HOWEVER, when I use this formula in the following statement:

SELECT pn.firstname FROM DeIdentificationData.dbo.pseudonyms pn WHERE pn.id = ABS(Checksum(NewId())) % 200

I get VERY intermittent results. I'd say about 30% of the results return one name picked out of the table (this is the expected result), about 30% come back with more than one result (which is baffling, there are no duplicate "id" column values), and about 30% come back with NULL (even though there are no empty rows in the "firstname" column)

I did look for quite a while for this specific issue, but to no avail so far. I'm assuming the issue has to do with using this formula as a pointer, but I'd be at a loss how to do this otherwise.

Thoughts?




Aucun commentaire:

Enregistrer un commentaire