random: Is there a way to shuffle rows in a table into distinctive fixed size chunks using SQL only?

mardi 14 janvier 2020

Is there a way to shuffle rows in a table into distinctive fixed size chunks using SQL only?

I have a very big table (~300 million rows) with the following structure: my_table(id, group, chunk, new_id), where chunk and new_id are set to NULL.

I want to set the rows of each group to a random chunk with distinct new_id in the chunk. Each chunk should be of fixed size of 100.

For example if group A has 1278 rows, they should go into 13 chunks (0-12), 12 chunks with 100 rows s.t. new_id are in range (0-99) and another single chunk with 78 rows s.t. new_id are in range (0-77).

The organization into chunks and within the chunks should be a random permutation where each row in A is assigned with a unique (chunk, new_id) tuple.

I'm successfully doing it using pandas but it takes hours, mostly due to memory and bandwidth limitations.

Is it possible to execute using only a SQL query? I'm using postgres 9.6.

random

mardi 14 janvier 2020

Is there a way to shuffle rows in a table into distinctive fixed size chunks using SQL only?

Aucun commentaire:

Enregistrer un commentaire