lundi 18 avril 2022

Duplicate Records with RAND() function

I have a python program to extract data from Microsoft SQL Server and load them to another table in the same database. Using this extraction program, I am trying to do the following.

  1. Get the input arguments from an excel file
  2. Fetch one random record from the database for each row present in the excel
  3. Load the output to another table

Using the RAND() function, I'm seeing duplicate records being retrieved most of the time even though the combination has a sufficient number of entries in the database. I tried a couple of other approaches like NEWID() and calculating the total number of rows and then retrieving a random row using numpy. But these queries take hours to execute even for a single combination and does not seem feasible.

Note: The table is huge (~7 million records) and left joins are used to retrieve the required data.

Are there any other methods to fix this issue?




Aucun commentaire:

Enregistrer un commentaire