mercredi 3 mars 2021

Deleting random rows from np array

I am working in ML, and want to take 20% of my data at random to use as validation. This is my current implementation:

x_train = np.transpose(np.array(x))
test_index = np.random.randint(0,len(x_train), math.floor(len(x_train)/5))
x_test = x_train[test_index]
x_train = np.delete(x_train, test_index, 0)
print(x_train.shape)
print(x_test.shape)

However, the new x_train doesn't have the correct length: I have 1092 rows to begin with, x_test always ends up with 218 rows, but the new length seems to be random. Like so:

for i in range(5) :
    x_train = np.transpose(np.array(x))
    print(len(x_train))
    test_index = np.random.randint(0,len(x_train), math.floor(len(x_train)/5))
    x_test = x_train[test_index]
    x_train = np.delete(x_train, test_index, 0)
    print(x_train.shape)
    print(x_test.shape)
    print(len(x_train) + len(x_test))
    print("+++++")

1092 (896, 48) (218, 48) 1114 +++++ 1092 (899, 48) (218, 48) 1117 +++++ 1092 (894, 48) (218, 48) 1112 +++++ 1092 (898, 48) (218, 48) 1116 +++++ 1092 (899, 48) (218, 48) 1117 +++++

I think what is happening is that np.delete() is deleting each row one by one, so once it reaches the end of test_index, some of the indexes are out of range and not deleted (which would imply that it is also not deleting the correct rows). How do I work around that? Is this behaviour expected?




Aucun commentaire:

Enregistrer un commentaire