For each row, I would like to randomly sample k columnar indices that correspond to non-null values.
If I start with this dataframe,
A = pd.DataFrame([
[1, np.nan, 3, 5],
[np.nan, 2, np.nan, 7],
[4, 8, 9]
])
>>> A
0 1 2 3
0 1.0 NaN 3.0 5.0
1 NaN 2.0 NaN 7.0
2 4.0 8.0 9.0 NaN
If I wanted to randomly sample 2 non-null values for each row and change them to the value -1, one way that can be done is as follows:
B = A.copy()
for i in A.index:
s = A.loc[i]
s = s[s.notnull()]
col_idx = random.sample(s.index.tolist(), 2)
B.iloc[i, col_idx] = -1
>>> B
0 1 2 3
0 -1.0 NaN -1.0 5.0
1 NaN -1.0 NaN -1.0
2 -1.0 -1.0 9.0 NaN
Is there a better way to do this natively in Pandas that avoids having to use a for loop? The pandas.DataFrame.sample
method seems to keep the number of columns that are sampled in each row constant. But if the dataframe has empty holes, the number of non-null values for each row wouldn't be constant.
Aucun commentaire:
Enregistrer un commentaire