I'm building my first de-identification script, and running into issues with my approach.
I have a table ("dbo.pseudonyms") whose "firstname" column is populated with 200 rows of data. Every row in this column of 200 rows has a value (none are null). This table also has an "id" column (int, primary key, not null) with the numbers 1-200.
What I want to do is, in one statement, re-populate my entire USERS table with firstname data randomly selected for each row from my "pseudonyms" table.
To generate the random number for picking I'm using ABS(Checksum(NewId())) % 200
. Every time I do SELECT ABS(Checksum(NewId())) % 200
I get a numeric value in the range I'm looking for just fine, no intermittently erratic behavior.
HOWEVER, when I use this formula in the following statement:
SELECT pn.firstname FROM DeIdentificationData.dbo.pseudonyms pn WHERE pn.id = ABS(Checksum(NewId())) % 200
I get VERY intermittent results. I'd say about 30% of the results return one name picked out of the table (this is the expected result), about 30% come back with more than one result (which is baffling, there are no duplicate "id" column values), and about 30% come back with NULL (even though there are no empty rows in the "firstname" column)
I did look for quite a while for this specific issue, but to no avail so far. I'm assuming the issue has to do with using this formula as a pointer, but I'd be at a loss how to do this otherwise.
Thoughts?
Aucun commentaire:
Enregistrer un commentaire