vendredi 16 juin 2023

Fill shaped(sized) pandas Dataframe with values randomly by stat count value. Reverse action for .count()

I need a DataFrame with r rows and dynamic number of columns(based on groups). Input count column specifies how many True values are expected in the new DataFrame. My current implementation creates a temporary DataFrame with a single row containing a True value for each group in df, and then explode()'s that temporary dataframe. Finally, it groups by count and aggregates to result df

input

--

| group | count | ... 
|   A   |   2   |     
|   B   |   0   |     
|   C   |   4   |     
|   D   |   1   |     

And i need to fill new DataFrame with this values randomly (c-(columns) value is dynamic same as names)

expected output

--

A B C D
NaN NaN True True
True NaN True NaN
NaN NaN NaN NaN
NaN NaN True NaN
True NaN True NaN

I think it's possible to add a randomized set of length from 1 to r and after expanding and etc. just agg(sum) by this values.

my code

--

inputs = [
    {"group": "A", "count": 2},
    {"group": "B", "count": 0}, 
    {"group": "C", "count": 4}, 
    {"group": "D", "count": 1}, 
    ]
df = pd.DataFrame(inputs)

def expand(count:int, group: str) -> pd.DataFrame:
    """expands DF by counts"""
    count = int(round(count))
    df1 = pd.DataFrame([{group: True}])
    # I'm thinking here i need to add random seed
    df1 = df1.assign(count = [list(range(1, count+1))])\
             .explode('count')\
             .reset_index(drop=True)
    return df1

def creator(df: pd.DataFrame) -> pd.DataFrame:
    """create new DF for every group value(count)"""
    dfs = [expand(r, df['group'].values[0]) for r in list(df['count'].values)]
    df = pd.concat(dfs, ignore_index=True)
    return df
    
df.groupby('group', as_index=False)\
    .apply(creator)\
    .drop('count', axis=1)\
    # and groupby my seed
    .groupby(level=1)\
    .agg(sum)

I tried to declare my questions if it will be helpful:

  1. Is there any method in pandas to make this easy/better?
  2. How can I make random counts and assign them in the expand() function?
  3. Is it a way to create sized DataFrame with NaN and then just drop there my values randomly(like pd.where or something)?

PS: This is my first time asking a question, so I hope I have provided enough information!




Aucun commentaire:

Enregistrer un commentaire