random: Creating/updating new field with random match from multiple matches in second dataframe

lundi 4 janvier 2021

Creating/updating new field with random match from multiple matches in second dataframe

I have a dataframe Counties:

CountyID     CountyName     SalesRep     FiscalQuarter     Sales
185           Cuyahoga       Winslow      2Q19             4,564
276           Waterton       Smith        1Q17             900

And a second dataframe CountyZips with County IDs and all the zip codes it contains:

CntyID     Zip
185          05643
185          05617
185          05866
276          32786
276          33465
276          34119

I want to either update the first dataframe with new "zip" column or make new dataframe with that zip column, and populate the "zip" field with a random match from the second dataframe's Zip column. In other words, there are multiple zips associated with one county ID in the second dataframe; I'd like to just get one of them (not first or last, but random, which could technically be first or last, I just want it to not ALWAYS be first or last, and I don't want to specify 3rd, 4th, etc. match because sometimes there might be just be one match, or no match). So, my desired result (either dataframe 1 updated or new dataframe):

CountyID     Zip     CountyName     SalesRep     FiscalQuarter     Sales
185          05617   Cuyahoga       Winslow      2Q19             4,564
276          34119   Waterton       Smith        1Q17             900

Note that the zips were updated with a random zip from dataframe 2 where County ID matches between both datasets.

I found one seemingly applicable answer to this on a previous question, where solution was:

d1[d2, on = .(gender, year, code),
  {ri <- sample(.N, 1L)
  .(amount = amount[ri], status = status[ri])}, by = .EACHI]

And I tried this, modifying dataframe and field names as appropriate (and I a matching on only one field, not 3), but all attempts got syntax error, including when I made dataframes and fields and data that matched those in the original question, so I'm not sure if this a python versioning issue or not (I'm using Python 3.7.4)

If anyone can help me with this I'd appreciate it. Thanks for your time

random

lundi 4 janvier 2021

Creating/updating new field with random match from multiple matches in second dataframe

Aucun commentaire:

Enregistrer un commentaire