jeudi 9 avril 2020

Ramdom sample without replacement while maining natural order of tabular data

I have time series data which is not monotonically increasing, so calling sort/shuffle is out of the question.

I want to randomly pull out n% of the data, while maintaining it relative order, to act as validation or test set, which can be shown as:

my_ndarray = [ 1, 20, 10, 3, 90, 5, 80, 50, 4, 1] # (number of samples = 1645, number of timesteps = 10, number of features = 7)
# custom_train_test_split()
train = [1, 20, 90, 5, 50, 4, 1]
valid = [10, 3, 80]

I would appreciate some guidance on how to do this efficiently. To my understanding Java style iteration is inefficient in python. I suspect a 3d boolean table mask would be the pythonic and vectorized way.




Aucun commentaire:

Enregistrer un commentaire