random: Fill shaped(sized) pandas Dataframe with values randomly by stat count value. Reverse action for .count()

vendredi 16 juin 2023

Fill shaped(sized) pandas Dataframe with values randomly by stat count value. Reverse action for .count()

I need a DataFrame with r rows and dynamic number of columns(based on groups). Input count column specifies how many True values are expected in the new DataFrame. My current implementation creates a temporary DataFrame with a single row containing a True value for each group in df, and then explode()'s that temporary dataframe. Finally, it groups by count and aggregates to result df

input

| group | count | ... 
|   A   |   2   |     
|   B   |   0   |     
|   C   |   4   |     
|   D   |   1   |

And i need to fill new DataFrame with this values randomly (c-(columns) value is dynamic same as names)

expected output

A	B	C	D
NaN	NaN	True	True
True	NaN	True	NaN
NaN	NaN	NaN	NaN
NaN	NaN	True	NaN
True	NaN	True	NaN

I think it's possible to add a randomized set of length from 1 to r and after expanding and etc. just agg(sum) by this values.

my code

inputs = [
    {"group": "A", "count": 2},
    {"group": "B", "count": 0}, 
    {"group": "C", "count": 4}, 
    {"group": "D", "count": 1}, 
    ]
df = pd.DataFrame(inputs)

def expand(count:int, group: str) -> pd.DataFrame:
    """expands DF by counts"""
    count = int(round(count))
    df1 = pd.DataFrame([{group: True}])
    # I'm thinking here i need to add random seed
    df1 = df1.assign(count = [list(range(1, count+1))])\
             .explode('count')\
             .reset_index(drop=True)
    return df1

def creator(df: pd.DataFrame) -> pd.DataFrame:
    """create new DF for every group value(count)"""
    dfs = [expand(r, df['group'].values[0]) for r in list(df['count'].values)]
    df = pd.concat(dfs, ignore_index=True)
    return df
    
df.groupby('group', as_index=False)\
    .apply(creator)\
    .drop('count', axis=1)\
    # and groupby my seed
    .groupby(level=1)\
    .agg(sum)

I tried to declare my questions if it will be helpful:

Is there any method in pandas to make this easy/better?
How can I make random counts and assign them in the expand() function?
Is it a way to create sized DataFrame with NaN and then just drop there my values randomly(like pd.where or something)?

PS: This is my first time asking a question, so I hope I have provided enough information!

random

vendredi 16 juin 2023

Fill shaped(sized) pandas Dataframe with values randomly by stat count value. Reverse action for .count()

input

expected output

my code

Aucun commentaire:

Enregistrer un commentaire