lundi 13 août 2018

Randomness Comparison Experiment

I have a drug analysis experiment that need to generate a value based on given drug database and set of 1000 random experiments.

The original database looks like this where the number in the columns represent the rank for the drug. This is a simplified version of actual database, the actual database will have more Drug and more Gene.

+-------+-------+-------+
| Genes | DrugA | DrugB |
+-------+-------+-------+
| A     |     1 |     3 |
| B     |     2 |     1 |
| C     |     4 |     5 |
| D     |     5 |     4 |
| E     |     3 |     2 |
+-------+-------+-------+

A score is calculated based on user's input: A and C, using the following formula:

# Compute Function
# ['A','C'] as array input

computeFunction(array) {
    # do some stuff with the array ...
}

The formula used will be same for any provided value.

For randomness test, each set of experiment requires the algorithm to provide randomized values of A and C, so both A and C can be having any number from 1 to 5

Now I have two methods of selecting value to generate the 1000 sets for P-Value calculation, but I would need someone to point out if there is one better than another, or if there is any method to compare these two methods.

Method 1

Generate 1000 randomized database based on given database input shown above, meaning all the table should contain different set of value pair.

Example for 1 database from 1000 randomized database:

+-------+-------+-------+
| Genes | DrugA | DrugB |
+-------+-------+-------+
| A     |     2 |     3 |
| B     |     4 |     4 |
| C     |     3 |     2 |
| D     |     1 |     5 |
| E     |     5 |     1 |
+-------+-------+-------+

Next we perform computeFunction() with new A and C value.

Method 2

Pick any random gene from original database and use it as a newly randomized gene value.

For example, we pick the values from E and B as a new value for A and C.

From original database, E is 3, B is 2.

So, now A is 3, C is 2. Next we perform computeFunction() with new A and C value.

Summary

Since both methods produce completely randomized input, therefore it seems to me that it will produce similar 1000-value outcome. Is there any way I could prove they are similar?




Aucun commentaire:

Enregistrer un commentaire