I have a data pipeline in place for streaming and batch data, which collects data from various sources and land it into hadoop's hdfs storage. I want to generate synthetic data with the same schema of the data landed on hdfs as soon as the it's ingested.
Which will be the best architecture to achieve this?
Aucun commentaire:
Enregistrer un commentaire