I have a very big table (~300 million rows) with the following structure: my_table(id, group, chunk, new_id), where chunk and new_id are set to NULL.
I want to set the rows of each group to a random chunk with distinct new_id in the chunk. Each chunk should be of fixed size of 100.
For example if group A has 1278 rows, they should go into 13 chunks (0-12), 12 chunks with 100 rows s.t. new_id are in range (0-99) and another single chunk with 78 rows s.t. new_id are in range (0-77).
The organization into chunks and within the chunks should be a random permutation where each row in A is assigned with a unique (chunk, new_id) tuple.
I'm successfully doing it using pandas but it takes hours, mostly due to memory and bandwidth limitations.
Is it possible to execute using only a SQL query? I'm using postgres 9.6.
Aucun commentaire:
Enregistrer un commentaire