mardi 4 décembre 2018

sample and split a dataset for multinomial image classification with each class labels' images stored in a separate folder for each class

how to sample (without replacement) from several folders containing image files (each belonging to the class name of the folder in which its stored) such that the relative proportion of images sampled is maintained.

For example, you have 4 classes: dog, cat, bird, turtle. There are 1000 dogs, 200 cats, 200 birds, 1400 turtles.

  • dogs |--img3487.png |--img2764.png ... |--img5773.png

  • cats |--img7701.png |--img5429.png ... |--img2716.png

  • birds |--img5232.png |--img6705.png

  • turtles |--img2601.png |--img7748.png

You want to ensure that when you split the dataset into, say, a 70/10/20 train/validation/test set, that the correct proportion of images are sampled from each animal's folder.




Aucun commentaire:

Enregistrer un commentaire