lundi 30 août 2021

Efficient data.table method to generate additional rows given random numbers

I have a large data.table that I want to generate a random number (using two columns) and perform a calculation. Then I want to perform this step 1,000 times. I am looking for a way to do this efficiently with out a loop.

Example data:

> dt <- data.table(Group=c(rep("A",3),rep("B",3)), 
                   Year=rep(2020:2022,2), 
                   N=c(300,350,400,123,175,156),
                   Count=c(25,30,35,3,6,8), 
                   Pop=c(1234,1543,1754,2500,2600,2400))
> dt
   Group Year   N Count  Pop
1:     A 2020 300    25 1234
2:     A 2021 350    30 1543
3:     A 2022 400    35 1754
4:     B 2020 123     3 2500
5:     B 2021 175     6 2600
6:     B 2022 156     8 2400
> dt[, rate := rpois(.N, lambda=Count)/Pop*100000]
> dt[, value := N*(rate/100000)]
> dt
   Group Year   N Count  Pop      rate     value
1:     A 2020 300    25 1234 1944.8947 5.8346840
2:     A 2021 350    30 1543 2009.0732 7.0317563
3:     A 2022 400    35 1754 1938.4265 7.7537058
4:     B 2020 123     3 2500  120.0000 0.1476000
5:     B 2021 175     6 2600  115.3846 0.2019231
6:     B 2022 156     8 2400  416.6667 0.6500000

I want to be able to do this calculation for value 1,000 times, and keep all instances (with an indicator column for 1-1,000 indicating which run) without using a loop. Any suggestions?




Aucun commentaire:

Enregistrer un commentaire