lundi 4 mai 2020

ML: Model retraining in practice

Assume that a bank does not have a credit scoring model yet. Detailed data from the past are available which can be used to create a classification model. This model will be used for the next six month.

Now, after 6 months, the model should be updated. What is the best way to do this? Because now only the data records of the customers are available that have not been rejected by the current model. Regarding the rejected customers, it is of course unclear whether they might actually have been creditworthy. The new sample is thus distorted and only contains the customers who have been marked as "good" by the current model. Should I still use this data?

Or should it from the beginning have been done in such a way that e.g. random 10% of customers are not rated by the model at all (and get a loan), so that an undistorted sample is available for retraining?

Best regards and thank you very much!




Aucun commentaire:

Enregistrer un commentaire