jeudi 30 juin 2016

MS SQL Server: extract unique ids and then get all rows matching a random sample of those unique ids

I'd like to do something similar to How to select random IDs and all transactions within those random IDs in SQL but all within Microsoft SQL Server 2011. My very large master table has travel records by individual and month:

FileDate     PersID     Location...
200001          1           A
200001          2           B
200001          3           A
200002          1           C
200002          2           C
200003          1           D
200004          1           A

There are about 260 million rows in the master table and about 4.2 million unique PersIDs.

To get started, I'm trying to just return the random sample of unique PersIDs. I've tried a number of things, the simplest of which is

SELECT distinct PersID FROM mastertbl
TABLESAMPLE (1 percent);

I've also tried wrapping the "SELECT distinct PersID" query as a subquery within another query and applying TABLESAMPLE in the outer query.

All of my attempts return about 1.7 million rows, which is 40% of the unique PersIDs not 1%. This makes me suspicious that my sampling method is sampling the master table and then applying the distinct criteria, but I can't figure out how to fix it.




Aucun commentaire:

Enregistrer un commentaire