jeudi 19 mars 2015

Selecting random row from a data.frame and assigning it to one of the two other data.frames based on three conditions in R

I have a data.frame (a) as mentioned below:



V1 V2
1 a b
2 a e
3 a f
4 b c
5 b e
6 b f
7 c d
8 c g
9 c h
10 d g
11 d h
12 e f
13 f g
14 g h


Lets assume each row represents an edge of a graph and the values of the rows are vertices.


What I want is to pick a random row (which is an edge) from data.frame (a) and assign it to data.frame (b) or data.frame (c) based on the three conditions below. Just to clarify that data.frame (b and c) are empty in the beginning. So the conditions are:



  1. When a row(edge) is randomly picked from data.frame (a) and if neither vertex has been assigned, then assign the edge to the data.frame with least number of rows.


To clarify this condition: Lets say I pick a random row(edge)#2 from data.frame (a) which has two vertices "a" and "e". So I should check if data.frame (b) and data.frame (c) have either "a" or "e" present in any of their rows. So if they have "a" or "e" present then this rule should not be implemented and next rule should be checked. If both data.frames do not have "a" or "e" present in any of the rows then nrow(number of rows) should be checked in both data.frames and the one with lower number of nrow() should be assigned that row. If both have same nrow() then any of the two data.frame could be assigned that row.



  1. When a row(edge) is randomly picked from data.frame (a) and if one of the vertices of that row is present in any of the data.frames (b) or (c) then assign the row(edge) to that data.frame


If a random row is picked say for example #3 which has "a" and "f". Then data.frames b and c should be checked to see if any of the rows contain either "a" or "f". Suppose data.frame (b) does not contain either "a" or "f" but data.frame (c) contains "f". So the row should be assigned to data.frame (c). Now there is also a possibility that data.frame (b) contains "a" and data.frame(c) contains "f". In that case, all the instances of "a" in data.frame (b) and "f" in data.frame (c) should be counted. If "a" appears 3 times and "f" appears 4 times then the row should be assigned to (b) i.e The row then should be assigned to the data.frame which has lower number of instances of the vertex present in that data.frame.



  1. When a row(edge) is randomly picked from data.frame (a) and if both the vertices of that row are present in a data.frame then assign the row to that data.frame


So to summarize, a random row should be picked from data.frame(a) and check for the above mentioned conditions and should be assigned to data.frame(b) or (c) after going through the conditions above. So all the rows of data.frame(a) have to be checked for the conditions.





Aucun commentaire:

Enregistrer un commentaire