mardi 15 septembre 2015

How do I generate a random RDD in Java Spark

Basically I want something like this,

int count = 100;
Java<String> myRandomRDD = generate(count, new Function<String, String>() {
        @Override
        public String call(String arg0) throws Exception {
            return RandomStringUtils.randomAlphabetic(42);
        }
    });

Theoretically I could use Spark RandomRDD, but I can't get it working right. I'm overwhelmed by the choices. Should I use RandomRDDs::randomRDD or RandomRDDs::randomRDDVector? Or should I use RandomVectorRDD?

I have tried the following, but I can't even get the syntax to be correct.

    RandomRDDs.randomRDD(jsc, new RandomDataGenerator<String>() {

        @Override
        public void setSeed(long arg0) {
            // TODO Auto-generated method stub

        }

        @Override
        public org.apache.spark.mllib.random.RandomDataGenerator<String> copy() {
            // TODO Auto-generated method stub
            return null;
        }

        @Override
        public String nextValue() {
            RandomStringUtils.randomAlphabetic(42);
        }
    }, count, ??);

The documentation is sparse, I'm confused, and I would appreciate any help.

Thanks!




Aucun commentaire:

Enregistrer un commentaire