mercredi 23 mai 2018

Hyperparameter search: different results on different machines using the same random seed

I'm doing a bayesian parameter search (scikit-optimize gp_minimize() function) for a MLPClassifier. I noticed that I get different results (occur first at iteration 12) when I run the script on machine 2.

If I rerun the script on machine 1 or machine 2 the results are reproducible. However they differ over different machines (tried 3). The datasets are the same, I dumped them using pickle and load the same dump using pickle.

I'm not sure if this is normal to some extent or if there is another source of randomness in my code or something completely different. The one thing I checked are the library versions of numpy, scikit-learn and scikit-optimize which all are the same.

Code:

space = [
         Categorical(['constant', 'invscaling', 'adaptive'], name="learning_rate"),
         Categorical(['lbfgs', 'sgd', 'adam'], name='solver'),
         Categorical(['identity', 'logistic', 'tanh', 'relu'], name='activation'),
         Real(10 ** -7, 10 ** 1, 'log-uniform', name='alpha'),
         Real(10** -7, 10**-2, name='tol'),
         Categorical([200, 500, 1000, 1500, 2000], name='max_iter'),
         Categorical(hidden_layers, name='hidden_layer_sizes')
]

if __name__ == '__main__':
    # print log headline
    print('{:3}{:>10}{:>10}{:>12}{:>7}{:>10}{:>13}{:>11}{:>6}{:>11}'.format('#', 'max(score)',  'score', 'Arg1', 'Arg2', 'Arg3', 'Arg4', 'Arg5', 'Arg6', 'Arg7' ))

    start = time()
    res_mlp = gp_minimize(objective, space, n_calls=50, random_state=42, callback=onstep, n_jobs=-1)
    optTime = time() - start

In the objective function I do cross validation with the MLPClassifier and return the negative mean score. If the code is required I will of course add it.

First 15 iterations of the run on two different machines:

Machine 1:

#  max(score)     score        Arg1   Arg2      Arg3         Arg4      Arg5  Arg6       Arg7
        #0    0.47567   0.47567    adaptive  lbfgs      relu    0.0059539 0.0044584   200   (28, 14)
        #1    0.50947   0.50947  invscaling  lbfgs      tanh    0.0000003 0.0072200  2000      (10,)
        #2    0.53443   0.53443    adaptive    sgd      tanh    0.0000001 0.0002307  1000   (24, 45)
        #3    0.53443   0.52291    constant   adam  identity    0.0000005 0.0061839   500  (100, 36)
        #4    0.53443   0.49588  invscaling   adam      tanh    0.0004018 0.0001327  2000   (32, 26)
        #5    0.54053   0.54053  invscaling  lbfgs  identity    0.0000085 0.0068327  1500   (50, 22)
        #6    0.54053   0.27127    constant    sgd  identity    0.1103802 0.0042516   500   (32, 30)
        #7    0.54053   0.00000    constant   adam  logistic    0.0001449 0.0092666  1500   (22, 16)
        #8    0.54053   0.00699  invscaling    sgd      relu    0.5705199 0.0074732  1000   (32, 80)
        #9    0.54053   0.00000    adaptive    sgd  logistic    0.0000235 0.0016528   200   (26, 22)
        #10   0.54053   0.52548  invscaling  lbfgs  identity    0.0000001 0.0100000  1500      (10,)
        #11   0.54053   0.53284  invscaling  lbfgs  identity    0.0000001 0.0100000  1000   (50, 22)
        #12   0.54053   0.54007  invscaling  lbfgs  identity    0.0007139 0.0032280  1500   (50, 22)
        #13   0.54053   0.00339    adaptive    sgd      relu    2.7419184 0.0090097   200   (26, 36)
        #14   0.54989   0.54989    constant  lbfgs  identity   10.0000000 0.0000001   500   (50, 22)
        #15   0.54989   0.54989    constant  lbfgs  identity   10.0000000 0.0000001   500   (50, 22)

Machine 2:

#  max(score)     score        Arg1   Arg2      Arg3         Arg4       Arg5  Arg6       Arg7
#0    0.47567   0.47567    adaptive  lbfgs      relu    0.0059539  0.0044584   200   (28, 14)
#1    0.50947   0.50947  invscaling  lbfgs      tanh    0.0000003  0.0072200  2000      (10,)
#2    0.53443   0.53443    adaptive    sgd      tanh    0.0000001  0.0002307  1000   (24, 45)
#3    0.53443   0.52291    constant   adam  identity    0.0000005  0.0061839   500  (100, 36)
#4    0.53443   0.49588  invscaling   adam      tanh    0.0004018  0.0001327  2000   (32, 26)
#5    0.54053   0.54053  invscaling  lbfgs  identity    0.0000085  0.0068327  1500   (50, 22)
#6    0.54053   0.27127    constant    sgd  identity    0.1103802  0.0042516   500   (32, 30)
#7    0.54053   0.00000    constant   adam  logistic    0.0001449  0.0092666  1500   (22, 16)
#8    0.54053   0.00699  invscaling    sgd      relu    0.5705199  0.0074732  1000   (32, 80)
#9    0.54053   0.00000    adaptive    sgd  logistic    0.0000235  0.0016528   200   (26, 22)
#10   0.54053   0.52548  invscaling  lbfgs  identity    0.0000001  0.0100000  1500      (10,)
#11   0.54053   0.53284  invscaling  lbfgs  identity    0.0000001  0.0100000  1000   (50, 22)
#12   0.54053   0.53878  invscaling  lbfgs  identity    0.0006349  0.0032752  1500   (50, 22)
#13   0.54053   0.00000    adaptive    sgd  logistic    0.0566396  0.0027155   200   (24, 20)
#14   0.54053   0.50515    adaptive   adam      tanh    0.0000039  0.0000959  1000   (24, 45)
#15   0.54989   0.54989  invscaling  lbfgs  identity   10.0000000  0.0000001  1500   (50, 22)

The first difference occurs in iteartion 12 in Arg5 and Arg6




Aucun commentaire:

Enregistrer un commentaire