mardi 15 octobre 2019

How to calculate the probability of getting a specific value in a random subsample in R?

I have 73 houses categorized as positive (1) or negative (0) for a disease. Each row (73) is a house and only one column with the values.

house
1 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 1 0 0 1 1 0 0

I would like to know the probability of getting at least one positive (1) if I randomly select 10 houses.

I used the following code:

test <- replicate(1000, sample(house, size=10, replace = FALSE))
m <- sum(colSums(matrix(test %in% c("1"), nrow = 10)) > 0)
m/1000

m
[1] 0.909

I got a probability of +/- 0.90

Then I used the function prop.test to obtain the Confidence Intervals

prop.test(m, 1000, conf.level=0.95, correct = FALSE)

1-sample proportions test without continuity correction

data:  m out of 1000, null probability 0.5
X-squared = 669.12, df = 1, p-value < 2.2e-16
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.8895744 0.9252953
sample estimates:
    p 
0.909 

I would like to know if with this procedure I can affirm that the probability of getting at least 1 positive in a sample of 10 is 0.909 (CI 0.890 - 0.9250).

Thanks in advance!




Aucun commentaire:

Enregistrer un commentaire