mercredi 22 novembre 2017

Expected versus observed: how to find statistical significance using bootstrapping in R?

I have data on 200,000 partnerships between firms. Each firm is coded to industry. Say there are 5 industries.

From this data I can make a 5 x 5 matrix of observed inter-industry partnership count.

Using this data I want to generate an expected inter-industry partnership count under random matching. I want to use this data to make a ratio of observed to expected. How would one do this using R?

Finally, can bootstrapping be used to find the statistical significance of this ratio?

Example data:

df <- data.frame(firm_x=c("A", "A", "B", "C", "C"),
             industry_x=c("1", "1","2","3","5"),
             firm_y=c("B", "C", "D", "D", "E"),
             industry_y=c("2", "5", "4", "4", "1")) 

df 

 firm_x industry_x firm_y industry_y
1      A          1      B          2
2      A          1      C          5
3      B          2      D          4 
4      C          3      D          4
5      C          5      E          1




Aucun commentaire:

Enregistrer un commentaire