mercredi 24 juin 2020

Choosing random lines with replacement from an XY file

I have an XY file with over 40000 unique lines of floating numbers. I want to use bootstrap resampling on this file. Bootstrap resampling works as follows: it resamples N random lines with replacement from the original file. This means the new data set has the same number of lines as the first file and the new dataset can contain some lines multiple times and might not contain some of the original lines at all. I tried shuffling lines using

“shuf -n N input > output”

and

“ sort -R input | head -n N > output”

, but it seems they don’t implement the replacement.

It is deeply appreciated if somebody could introduce a way to do this using AWK and Shell.




Aucun commentaire:

Enregistrer un commentaire