dimanche 5 janvier 2020

R function/method to sample data frame using probability until condition is reached

I have a data frame with 3 columns:

ObjectID: the unique identifier of a polygon (or row) AvgWTRisk: probability (0-1) of a disturbance in a forest, ~0.11 is the highest value HA: AREA of a polygon in the forest

I want to develop a function to create a random sample from the data frame, based on the probability value. Here's an example of the data structure:

      OBJECTID AvgWTRisk        HA
32697    32697 0.0008456 7.7465000
36480    36480 0.0050852 7.9329797
13805    13805 0.0173463 0.7154995
38796    38796 0.0026580 0.2882192
8494      8494 0.0089310 6.4686595
23609    23609 0.0090647 6.1246000

I am attempting to do this using the sample() function in R.

Is there any way to use the sum of area as my 'size = ' target as opposed to a number of rows, as such:

Landscape_WTDisturbed <- Landscape_WTRisk[sample(1:nrow(Landscape_WTRisk),
                                                 size = sum(HA >= 100*0.95 && HA <= 100*1.05),
                                                 prob = WTProb, replace = FALSE),]

where: WTProb is as vector of AvgWTRisk, i.e. 'WTProb <- as.vector(Landscape_WTRisk$AvgWTRisk' and HA is the area column from the data frame.

The sample selection above provides me a dataframe with all of the columns but no rows.

As opposed to:

Landscape_WTDisturbed <- Landscape_WTRisk[sample(1:nrow(Landscape_WTRisk),
                                                 size = 10,
                                                 prob = WTProb, replace = FALSE),]

Which works in providing a sample of 10 rows. However, I have no control over the area being selected.

Should I try to achieve this with a while loop, where the area of all of the rows summed together is the criteria, and a small selection of rows can be incrementally added together until the target is reached?

Thank you in advance!




Aucun commentaire:

Enregistrer un commentaire