vendredi 26 août 2016

How to assign multiple event IDs per column in a matrix

I am newbie to R and learning Introduction to R. I have a large matrix (multiples of 100000s) and a few hundred columns. I am giving an example matrix herebelow.

set.seed(1)
m4 <- matrix(sample(0:3,5*4, replace=TRUE),5,4) # sample event matrix
m4GROUP <- data.frame( X1=rowSums(m4[,1, drop=FALSE]), X2=rowSums(m4[,2:3]), 
                       X3=rowSums(m4[,4, drop=FALSE]) ) # 1 column
m4GroupColID <- colSums(m4GROUP) # coln sum to generate a matrix per col

Output

m4
     [,1] [,2] [,3] [,4]
[1,]    1    3    0    1
[2,]    1    3    0    2
[3,]    2    2    2    3
[4,]    3    2    1    1
[5,]    0    0    3    3

> m4GROUP # group by Col 1, Cols 2-3, Col4
  X1 X2 X3
1  1  3  1
2  1  3  2
3  2  4  3
4  3  3  1
5  0  3  3

> m4GroupColID
X1 X2 X3 
 7 16 10 

  • I need to generate a matrix of 5 x 4 x total eventsIDs per Row by Group. In this case, I need to get 4 matrix for 4 cols with a dimension of 5 x max(M4Group) per column.
  • For Every Row, the IDs are replaced (i.e. same IDs can repeat, if we have a grouping of two or more columns, between columns forming the group)
  • Between Row, the IDs are not replaced (i.e IDs of previous Row can not be allocated for the next row in the same group having two or more columns); however, IDs of the current row can be allocated between columns in the group.
  • Finally, for groups formed by more than one column, the totalIDs per row can be randomly allocated. This ensures, the group having more than one column are impacted by the ID.

I'm sorry this is a long post and I just do not know how to shorten this without losing the essence.

Total event IDs are 7 for col 1, 16 for cols 2-3 and 10 for col.4:

The final outputs, for example with a random allocation in cols 2-3, per column are as follows: IDs for cols are: 1-7, 8-23, 24-33.

Col 1    Col 2        Col 3        col 4
1 0 0    8 9 10 0     0 0 0 0      24 0 0 
2 0 0    11 12 13 0   0 0 0 0      25 26 0
3 4 0    14 15 17 0   14 16 0 0    27 28 29
5 6 7    18 20 0 0    18 0 0 0     30 0 0 
0 0 0    0 0 0 0      21 22 23 0   31 32 33

If we have just one event per row, the id generation and distribution are fairly straightforward.

m1 <- matrix(sample(0:1,5*4, replace=TRUE),5,4) # sample event matrix
ifelse(m1==0, 0, matrix(sample(1:1,5*4, replace = T), 5,4)) # works for one  ID assignment

I started with a loop, with a knowledge, for loops are not faster in R. But,I am getting errors, irrespective of with or without replacement. I definitely need (Replacement = True on the same row) and (Replacement = False) between Rows.

nc <- max(m4GROUP$X1)
i <- 1
j <- 1
while(i <- 5){
  if(m4[i,j] == 0){
    m[i,j] <- matrix(0, 5, nc)
  } else {
    if(m4[i,j] == 1){
      m[i,j] <- matrix(sample(1, i*nc, replace=T), 5, nc)
    } else {
      m[i,j] <- matrix(sample(2, i*nc, replace=T), 5, nc)
    }
  }
}

I'm aware it is not correct. I get the following errors.

Error in m[i, j] <- matrix(sample(0, i * nc, replace = T), 5, nc) : 
  number of items to replace is not a multiple of replacement length
Error in m[i, j] <- matrix(0, 5, nc) : 
  number of items to replace is not a multiple of replacement length

I also tried the following: I do not get results that satisfy the above bullet points; further, they are starting with the same ID for every row. I get only 3 rows and not 4 rows. Finally, this is not ideal, considering my actual sample size.

size <- c(1:7, 1:16, 1:9, 10)
startID <- which(size==1)
endIDs <- c(which(size==1)[-1] -1, length(size))
mats <- mapply(function(x, y) t(size[seq(x, y)]), startID, endIDs)
library(plyr)
m <- (rbind.fill.matrix(mats))

output:

m
     1 2 3 4 5 6 7  8  9 10 11 12 13 14 15 16
[1,] 1 2 3 4 5 6 7 NA NA NA NA NA NA NA NA NA
[2,] 1 2 3 4 5 6 7  8  9 10 11 12 13 14 15 16
[3,] 1 2 3 4 5 6 7  8  9 10 NA NA NA NA NA NA

I am once again sorry for the long post but thanks for not only reading this, but also also the help. Thanks ton.




Aucun commentaire:

Enregistrer un commentaire