random: Why does stacking CNN wreck reproducibility (even with seed & CPU)?

samedi 2 novembre 2019

Why does stacking CNN wreck reproducibility (even with seed & CPU)?

REPRODUCIBLE:

ipt = Input(batch_shape=batch_shape)
x   = Conv2D(6, (8, 8), strides=(2, 2), activation='relu')(ipt)
x   = Flatten()(x)
out = Dense(6, activation='softmax')(x)

NOT REPRODUCIBLE:

ipt = Input(batch_shape=batch_shape)
x   = Conv2D(6, (8, 8), strides=(2, 2), activation='relu')(ipt)
x   = Conv2D(6, (8, 8), strides=(2, 2), activation='relu')(x)
x   = Flatten()(x)
out = Dense(6, activation='softmax')(x)

The difference amplifies substantially when using a larger model, and actual data instead of random noise - up to 30% difference in accuracy (relative) within a single small epoch. Environment setup, considered sources, and full minimal reproducible example below. Relevant Git

What is the problem, and how to fix it?

POSSIBLE SOURCES: ([x] = ruled out)

[x] TF2 vs. TF1; Keras 2.3.0+ vs. Keras 2.2.5 (tested both)
[x] Random seeds (numpy, tf, random, PYTHONHASHSEED)
[x] Data values / shuffling (same values, no shuffling)
[x] Weight initializations (same values)
[x] GPU usage (used CPU)
[x] Numeric imprecision (used float64; further, extent of discrepancy too large for num. impr.)
[x] Bad CUDA install (all official guide tests passed, TF detects GPU & CUDA)

ENVIRONMENT:

CUDA 10.0.130, cuDNN 7.6.0, Windows 10, GTX 1070
Python 3.7.4, Spyder 3.3.6, Anaconda 3.0 10/19
Anaconda Powershell Prompt terminal to set PYTHONHASHSEED and start Spyder

OBSERVATIONS:

float64 vs. float32 - no noticeable difference
CPU vs. GPU - no noticeable difference
Non-reproducible also for Conv1D
Reproducible for Dense replacing Conv; other layers not tested
For a larger model, which is still 'small', loss variance is substantial within a single epoch:

one_epoch_loss = [1.6814, 1.6018, 1.6577, 1.6789, 1.6878, 1.7022, 1.6689]
one_epoch_acc  = [0.2630, 0.3213, 0.2991, 0.3185, 0.2583, 0.2463, 0.2815]

CODE:

batch_shape = (32, 64, 64, 3)
num_samples = 1152

ipt = Input(batch_shape=batch_shape)
x   = Conv2D(6, (8, 8), strides=(2, 2), activation='relu')(ipt)
x   = Conv2D(6, (8, 8), strides=(2, 2), activation='relu')(x)
x   = Flatten()(x)
out = Dense(6, activation='softmax')(x)
model = Model(ipt, out)
model.compile('adam', 'sparse_categorical_crossentropy')

X = np.random.randn(num_samples, *batch_shape[1:])
y = np.random.randint(0, 6, (num_samples, 1))

reset_seeds()
model.fit(x_train, y_train, epochs=5, shuffle=False)

Imports / setup:

import os
os.environ['PYTHONHASHSEED'] = '0'
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
import numpy as np
np.random.seed(1)
import random
random.seed(2)

import tensorflow as tf

def reset_seeds():
    np.random.seed(1)
    random.seed(2)
    if tf.__version__[0] == '2':
        tf.random.set_seed(3)
    else:
        tf.set_random_seed(3)
    print("RANDOM SEEDS RESET")

reset_seeds()

from keras.layers import Input, Dense, Conv2D, Flatten
from keras.models import Model
import keras.backend as K

K.set_floatx('float64')

random

samedi 2 novembre 2019

Why does stacking CNN wreck reproducibility (even with seed & CPU)?

Aucun commentaire:

Enregistrer un commentaire