mercredi 20 septembre 2017

How to take subsets of two different Data Frames for comparison - by a random sample?

I would like to compare two different Data Frames. Both Data Frames consists of an equal number of rows and columns. The first Data Frame (1) are purchase probabilities from 0 to 1, whereas the second Data Frame (2) is coded binary and represents real purchases by a user.

My struggle is now, how I could take a RANDOM subset from df (1) which would be ALSO THE SAME in df (2) to compare this subset?

For example: How I can take a subset of 100 users (rows) and two products (columns) of df (1) which are the same as of df (2).

Is this however possible?
Or do I have to re-manipulate my data frames first?
In general it is possible to join both Data Frames by user_ID - if this could be important.

# FIRST DF CONSISTS OF PROBABILITIES
df_probabilities <- data.frame(matrix(runif(20000,0,1), nrow=1000,ncol = 20))
# SECOND DF CONSISTS OF BINARY DATA
library("Matrix")
df_binary <- data.frame(as.matrix(rsparsematrix(1000, 20,nnz = 800,  rand.x = runif)))
df_binary[df_binary > 0] = 1




Aucun commentaire:

Enregistrer un commentaire