lundi 24 décembre 2018

Which is the best way to generate synthetic data for data science development.

I have a data pipeline in place for streaming and batch data, which collects data from various sources and land it into hadoop's hdfs storage. I want to generate synthetic data with the same schema of the data landed on hdfs as soon as the it's ingested.

Which will be the best architecture to achieve this?




Aucun commentaire:

Enregistrer un commentaire