vendredi 20 décembre 2019

Weird behavior of zipped tensorflow dataset with random tensors

In the example below (Tensorflow 2.0), we have a dummy tensorflow dataset with three elements. We map a function on it (replace_with_float) that returns a randomly generated value in two copies. As we expect, when we take elements from the dataset, the first and second coordinates have the same value.

Now, we create two "slice" datasets from the first coordinates and the second coordinates, respectively and we zip the two datasets back together. The slicing and the zipping operations seems inverses of each other, so I would expect the resulting dataset to be equivalent to the previous one. However, as we see, now the first and second coordinates are different randomly generated values.

Maybe even more interestingly, if we zip the "same" dataset with itself by df = tf.data.Dataset.zip((df.map(lambda x, y: x), df.map(lambda x, y: x))), the two coordinates will also have different values.

How can this behavior be explained? Perhaps two different graphs are constructed for the two datasets to be zipped and they are run independently?

import tensorflow as tf

def replace_with_float(element):
    rand = tf.random.uniform([])
    return (rand, rand)

df = tf.data.Dataset.from_tensor_slices([0, 0, 0])
df = df.map(replace_with_float)
print('Before zipping: ')
for x in df:
    print(x[0].numpy(), x[1].numpy())

df = tf.data.Dataset.zip((df.map(lambda x, y: x), df.map(lambda x, y: y)))

print('After zipping: ')
for x in df:
    print(x[0].numpy(), x[1].numpy())

Sample output:

Before zipping: 
0.08801079 0.08801079
0.638958 0.638958
0.800568 0.800568
After zipping: 
0.9676769 0.23045003
0.91056764 0.6551999
0.4647777 0.6758332



Aucun commentaire:

Enregistrer un commentaire