lundi 28 novembre 2016

Sample an unevenly distributed set for training

I'm training an SGD neural net classifier on a very imbalanced training dataset. To compensate for underepresentated classes, I perform actual training on a set randomly sampled s.t. classes with fewer examples get picked more often.

What is a principled way to pick the volume of the latter set vs the number of epochs it will be run on? Advice much appreciated.




Aucun commentaire:

Enregistrer un commentaire