I have a dataset of ~3700 rows and need to remove 1628 of those rows based on the column. The dataset looks like this:
compliance day0 day1 day2 day3 day4
True 1 3 9 8 8
False 7 4 8 3 2
True 4 5 0 3 5
True 5 3 9 6 2
for 1068 rows I want to remove the entire row if compliance=true.
The thing is, I want to do this randomly; I don't want to remove the first 1063 rows. I tried this:
for z in range(1629):
rand = random.randint(0,(3783-z)) #subtract z since dataframe shape is shrinking
if str(data.iloc[rand,1]) == 'True':
data = data.drop(balanced_dataset.index[rand])
But I'm getting the following error, after it removes a few rows:
'labels [2359] not contained in axis'
I also tried this:
data.drop(data("adherence.str.startswith('T').values").sample(frac=.4).index)
frac is arbitrarily picked for now, I just wanted it to work. I got the following error:
'DataFrame' object is not callable
Any help would be greatly appreciated! Thank you
Aucun commentaire:
Enregistrer un commentaire