I have a dataframe Counties:
CountyID CountyName SalesRep FiscalQuarter Sales
185 Cuyahoga Winslow 2Q19 4,564
276 Waterton Smith 1Q17 900
And a second dataframe CountyZips with County IDs and all the zip codes it contains:
CntyID Zip
185 05643
185 05617
185 05866
276 32786
276 33465
276 34119
I want to either update the first dataframe with new "zip" column or make new dataframe with that zip column, and populate the "zip" field with a random match from the second dataframe's Zip column. In other words, there are multiple zips associated with one county ID in the second dataframe; I'd like to just get one of them (not first or last, but random, which could technically be first or last, I just want it to not ALWAYS be first or last, and I don't want to specify 3rd, 4th, etc. match because sometimes there might be just be one match, or no match). So, my desired result (either dataframe 1 updated or new dataframe):
CountyID Zip CountyName SalesRep FiscalQuarter Sales
185 05617 Cuyahoga Winslow 2Q19 4,564
276 34119 Waterton Smith 1Q17 900
Note that the zips were updated with a random zip from dataframe 2 where County ID matches between both datasets.
I found one seemingly applicable answer to this on a previous question, where solution was:
d1[d2, on = .(gender, year, code),
{ri <- sample(.N, 1L)
.(amount = amount[ri], status = status[ri])}, by = .EACHI]
And I tried this, modifying dataframe and field names as appropriate (and I a matching on only one field, not 3), but all attempts got syntax error, including when I made dataframes and fields and data that matched those in the original question, so I'm not sure if this a python versioning issue or not (I'm using Python 3.7.4)
If anyone can help me with this I'd appreciate it. Thanks for your time
Aucun commentaire:
Enregistrer un commentaire