lundi 4 décembre 2017

sampling ratio for imbalanced dataset

I have an imbalanced dataset that have two classes (+1,-1). the positives are only 7% of the dataset. I want to classify using Desicion Trees. I tried downsampling the negatives to: 1.The same size of the positives 2. The double or triple the size of the positives.

for all of them I got almost the same precision but the recall of positives was much better for the first sample (negatives same size as positives). But I feel I'm missing something here so what is bad about this sampling??




Aucun commentaire:

Enregistrer un commentaire