lundi 12 décembre 2016

Generate correlated data in java

Hello stackoverflow community,

I implemented a skyline algorithm in java. Now I want to test this algorithm with real data and generated data. Therefore, I want to generate correlated, anti-correlated, independent and gaussian data to test the algorithm. The data should have d dimensions and a maximal value for each dimension.

I want to use the idea from the paper: The Skyline Operator (2001)

a correlated database represents an environment in which points which are good in one dimension are also good in the other dimensions. For instance, students which have a good publication record typically also do well in their preliminaries. We generate a random point in the correlated database as follows. First we select a plane perpendicular to the line from (0, . . . , 0) to (1, . . . , 1) using a normal distribution; the new point will be in that plane. We use a normal distribution to select the plane so that more points are in the middle than at the ends. Within the plane, the individual attribute values are again generated using a normal distribution; this makes sure that most points are located close to the line from (0, . . . 0) to (1, . . . 1).

I have already found following solution in the internet. However, I don't quite get it. Especially why you should divide by 7. I thought you need to work with a correlation coefficient.

public static double[] nextVal() {
    Random generator = new Random(System.nanoTime());
    //maximal values for each dimension
    int[] maxValues = { 100, 100, 100, 100, 100 };
    double[] result = new double[maxValues.length];

    // create first value:
    double first = generator.nextDouble();

    double dist = 0.5 - Math.abs(0.5 - first);
    for (int i = 0; i < result.length; i++) {
        double candidate = Math.sqrt(dist) * (generator.nextGaussian() / 7);
        candidate = first + candidate;
        result[i] = candidate;
        if (result[i] < 0.0 || result[i] > 1.0) {
            --i;
        }
    }
    for (int i = 0; i < maxValues.length; i++) {
        result[i] = (int) (maxValues[i] * result[i] + 1);
    }
    return result;
}

If someone can explain this to me or give me some tips how to do this I would really appreciate it.

Thank you very much in advance.




Aucun commentaire:

Enregistrer un commentaire