I'm working on a particular problem which I managed to solve in a very inefficient way and I would like to see if there a more efficient way to do it. I have a Dataframe that has its first column filled with positive values and then almost 69 columns filled with either 1, 0 or -1 that represent a particular condition for a certain property. I divided these columns into 4 groups based on their position and added a column in the last position (69) that has all null values:
Group A = (69, 5, 6, 9, 10, 12, 15, 19, 20, 27, 43, 53, 54, 55)
Group B = (69, 6, 10, 15, 29, 31, 34, 36, 48, 49, 58, 59, 60, 62, 64, 68)
Group C = (69, 7, 13, 16, 17, 22, 24, 26, 28, 32, 33, 38, 40, 42, 44, 46, 51, 56, 66)
Group D = (69, 8, 11, 14, 18, 21, 23, 25, 30, 35, 37, 39, 41, 45, 47, 50, 52, 57, 61, 63, 65, 67)
What I have to do is to randomly pick 1 column from each group and check if either all the values are 1 or -1 while Null values provided by column 69 counts as wildcars, so if i have 1,null,1,null counts as 1,1,1,1 and viceversa.
I have to generate a series with the percentage differences between values of the first column of the dataframe. The first value will be the one in the first row that has all -1/null, the second starting from this last row will be the first that has all 1/null this would be the first difference, from this row will go down the whole dataframe selecting the first -1/null and the subsequent 1/null to create a second, third, fourth, etc percentage differences adding them to a series.
This is how this particular array will look like:
Values Column 6 Column 69 Column 28 Column 61 Selected Perc.Diff
43 1 Null 1 1 43 (1st 1/null) Nan
90 -1 Null -1 -1 90 (1st -1/null) Nan
20 1 Null 1 0 Nan Nan
40 -1 Null -1 -1 40 (2nd -1/null) Nan
30 1 Null 1 -1 Nan Nan
95 1 Null 1 1 95 (1st 1/Null) 0.055 (90->95)
8 1 Null 1 1 8 (2nd 1/Null) Nan
52 1 Null 0 1 Nan Nan
63 1 Null 1 1 63 (3rd 1/Null) Nan
73 -1 Null -1 -1 73 (1st -1/null) Nan
59 1 Null 1 1 59 (1st 1/Null) -0.274 (73->59)
19 1 Null 0 1 Nan Nan
21 -1 Null -1 -1 21 (1st -1/null) Nan
The last 2 columns doesn't need to be in the dataframe since i only need a series with the diff values that doesn't need to have the same length as the dataframe. I would like to see if there is a more efficient way to do this. Here is my solution:
a = shuffle([69, 5, 6, 9, 10, 12, 15, 19, 20, 27, 43, 53, 54, 55])
b = shuffle([69, 5, 6, 10, 15, 29, 31, 34, 36, 48, 49, 58, 59, 60, 62, 64, 68])
c = shuffle([69, 7, 13, 16, 17, 22, 24, 26, 28, 32, 33, 38, 40, 42, 44, 46, 51, 56, 66])
d = shuffle([69, 8, 11, 14, 18, 21, 23, 25, 30, 35, 37, 39, 41, 45, 47, 50, 52, 57, 61, 63, 65, 67])
Generate list with 4 random column indexes:
ind_list = [a[0], b[0], c[0], d[0]]
Create series based on selected values:
lim = ind_list.count(69)
df.loc[(df.iloc[:, ind_list[0]] + df.iloc[:, ind_list[1]] + df.iloc[:, ind_list[2]] + df.iloc[:,
ind_list[3]] == len(ind_list) - lim), 'Selected'] = df.Values
df.loc[(df.iloc[:, ind_list[0]] + df.iloc[:, ind_list[1]] + df.iloc[:, ind_list[2]] + df.iloc[:,
ind_list[3]] == -len(ind_list) + lim), 'Selected'] = -df.Values
Filter values that are not "Firsts":
l1 = 0
df1 = df[df['Selected'] != 0]
df1.loc[(df1.Selected.shift(1).apply(np.sign)== df1.Selected.apply(np.sign)),
'Selected'] = 0
df2 = df1[df1['Selected'] != 0]
Removing first value if it has been selected with the 1/null condition since first element has to be -1/null and the last if it is -1/null. Column 70 is the 'Selected' Column.
if len(df2) > 0:
if df2.iloc[0, 70] > 0:
l1 = 1
else:
continue
if df2.iloc[-1, 70] < 0:
perc = df2.iloc[l1:-1, 70].abs()
else:
perc = df2.iloc[l1:, 70].abs()
Calculate percentage differences: perc_diff = perc.pct_change()[1::2]
The problem with this method is that if/else block that I feel could be avoided and also a SettingWithCopyWarning for this line of code:
df1.loc[(df1.Selected.shift(1).apply(np.sign) == df1.Selected.apply(np.sign)), 'Selected'] = 0
That I don't know how to avoid. Thanks for all the help you can give me, since this script is part of a loop that has like thousands of iterations any ms that I can save counts.
Aucun commentaire:
Enregistrer un commentaire