mercredi 26 avril 2023

Subsetting sample data with given conditions in python

I need to subset dataset for 100 households from a large DataFrame to prepare input for execution of test cases.

In order to identify a household on data file, A household in datafile can be identified as all members of the household will share the same SERIALNO. All members of each selected household must be included in the subset sample data. There are 50 variables in the DataFrame (dataset) and ten thousands SERIALNO but I needed to subset the dataset based on two variable SERIALNO and Household_member. Each SERIALNO represents one household and also Household_member. I just needed to create a subset of 100 households (SERIALNO) with Household_member included in it with rest of the variables in the dataset.

In the example below, SERIALNO 20161 has household_number 1, 2, 3 and SERIALNO 20162 has 1 household_member and SERIALNO 20164 has household_number 1, 2, 3 and so on and some household_member are up to 15. How do I subset of 100 households with SERIALNO that includes household_members as described below? Please help with the python code to subset this dataset. I am just learning python-pandas. I would greatly appreciate your assistance.

SERIALNO Household_member

20161 1 20161 2 20161 3

20162 1 20164 1 20164 2 20164 3

1 ACCE

I need to subset dataset for 100 households from a large DataFrame to prepare input for execution of test cases.

In order to identify a household on data file, A household in datafile can be identified as all members of the household will share the same SERIALNO. All members of each selected household must be included in the subset sample data. There are 50 variables in the DataFrame (dataset) and ten thousands SERIALNO but I needed to subset the dataset based on two variable SERIALNO and Household_member. Each SERIALNO represents one household and also Household_member. I just needed to create a subset of 100 households (SERIALNO) with Household_member included in it with rest of the variables in the dataset.

In the example below, SERIALNO 20161 has household_number 1, 2, 3 and SERIALNO 20162 has 1 household_member and SERIALNO 20164 has household_number 1, 2, 3 and so on and some household_member are up to 15. How do I subset of 100 households with SERIALNO that includes household_members as described below? Please help with the python code to subset this dataset. I am just learning python-pandas. I would greatly appreciate your assistance.

SERIALNO Household_member

20161 1 20161 2 20161 3

20162 1 20164 1 20164 2 20164 3




Aucun commentaire:

Enregistrer un commentaire