Given a numpy array consisting of data which has been generated for ongoing time from a simulation. Based on this I'm using tensorflow and keras to train a neural network and my question refers to this line of code in my model:
model.fit(X1, Y1, epochs=1000, batch_size=100, verbose=1, shuffle=True, validation_split=0.2)
After having read in the documentation of Keras I found out that the validation data set (in this case 20% of the original data) is sliced from the end. As Im generating data for ongoing time I obviously don't want the last part to be sliced off because it would not be representative for validation. I'd rather want the validation data to be chosen randomly from the whole data set. For this purpose I am right now shuffling my whole data set (inputs and outputs for the ANN in unison) before training to gain random validation data.
I feel like I don't want to ruin the time component in my data which is why I'm searching for a solution to just choose the validation set randomly without having to shuffle the whole data set. Also, I'd like to know what you guys think of not shuffling time continuous data.
Aucun commentaire:
Enregistrer un commentaire