I have time series data which is not monotonically increasing, so calling sort/shuffle is out of the question.
I want to randomly pull out n% of the data, while maintaining it relative order, to act as validation or test set, which can be shown as:
my_ndarray = [ 1, 20, 10, 3, 90, 5, 80, 50, 4, 1] # (number of samples = 1645, number of timesteps = 10, number of features = 7)
# custom_train_test_split()
train = [1, 20, 90, 5, 50, 4, 1]
valid = [10, 3, 80]
I would appreciate some guidance on how to do this efficiently. To my understanding Java style iteration is inefficient in python. I suspect a 3d boolean table mask would be the pythonic and vectorized way.
Aucun commentaire:
Enregistrer un commentaire