I tried a method to split data between train and test sets, but it seems that it fill the train with zeros and leave the data in test...
In theory, it works :
When I apply the following function which randomly selects some columns of the given array, it worked with the DataLens with numpy matrix but not with others.
def train_test_split(array):
test = np.zeros(array.shape)
train = array.copy()
for user in xrange(array.shape[0]):
test_ratings = np.random.choice(array[user, :].nonzero()[0],
size=10,
replace=False)
train[user, test_ratings] = 0.
test[user, test_ratings] = ratings[user, test_ratings]
# Test and training are truly disjoint
assert(np.all((train * test) == 0))
return train, test
train, test = train_test_split(ratings)
With simple data it doesn't work :
When using simple data :
ratings :
[[ 1. 1. 0. 0. 0.]
[ 1. 0. 0. 0. 0.]
[ 0. 0. 1. 0. 0.]
[ 1. 0. 0. 0. 0.]
[ 0. 0. 0. 1. 1.]]
It fill the array with 0 one by one even if train was a copy of ratings at the very beginning :
train :
[[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]]
Aucun commentaire:
Enregistrer un commentaire