I have a python program to extract data from Microsoft SQL Server and load them to another table in the same database. Using this extraction program, I am trying to do the following.
- Get the input arguments from an excel file
- Fetch one random record from the database for each row present in the excel
- Load the output to another table
Using the RAND() function, I'm seeing duplicate records being retrieved most of the time even though the combination has a sufficient number of entries in the database. I tried a couple of other approaches like NEWID() and calculating the total number of rows and then retrieving a random row using numpy. But these queries take hours to execute even for a single combination and does not seem feasible.
Note: The table is huge (~7 million records) and left joins are used to retrieve the required data.
Are there any other methods to fix this issue?
Aucun commentaire:
Enregistrer un commentaire