I am working in ML, and want to take 20% of my data at random to use as validation. This is my current implementation:
x_train = np.transpose(np.array(x))
test_index = np.random.randint(0,len(x_train), math.floor(len(x_train)/5))
x_test = x_train[test_index]
x_train = np.delete(x_train, test_index, 0)
print(x_train.shape)
print(x_test.shape)
However, the new x_train
doesn't have the correct length: I have 1092 rows to begin with, x_test
always ends up with 218 rows, but the new length seems to be random. Like so:
for i in range(5) :
x_train = np.transpose(np.array(x))
print(len(x_train))
test_index = np.random.randint(0,len(x_train), math.floor(len(x_train)/5))
x_test = x_train[test_index]
x_train = np.delete(x_train, test_index, 0)
print(x_train.shape)
print(x_test.shape)
print(len(x_train) + len(x_test))
print("+++++")
1092 (896, 48) (218, 48) 1114 +++++ 1092 (899, 48) (218, 48) 1117 +++++ 1092 (894, 48) (218, 48) 1112 +++++ 1092 (898, 48) (218, 48) 1116 +++++ 1092 (899, 48) (218, 48) 1117 +++++
I think what is happening is that np.delete()
is deleting each row one by one, so once it reaches the end of test_index
, some of the indexes are out of range and not deleted (which would imply that it is also not deleting the correct rows). How do I work around that? Is this behaviour expected?
Aucun commentaire:
Enregistrer un commentaire