lundi 16 décembre 2019

ValueError: Found array with dim 4. check_pairwise_arrays expected <= 2

I am trying to visualize a dataset with t-SNE using this post.

X_embedded = fit(X) causes the below error:

Traceback (most recent call last):
  File "/home/user/project/tSNE_visualization.py", line 166, in <module>
    X_embedded = fit(X)
  File "/home/user/project/tSNE_visualization.py", line 83, in fit
    distances = pairwise_distances(X, metric='euclidean', squared=True)
  File "/usr/local/lib/python3.5/dist-packages/sklearn/metrics/pairwise.py", line 1247, in pairwise_distances
    return _parallel_pairwise(X, Y, func, n_jobs, **kwds)
  File "/usr/local/lib/python3.5/dist-packages/sklearn/metrics/pairwise.py", line 1090, in _parallel_pairwise
    return func(X, Y, **kwds)
  File "/usr/local/lib/python3.5/dist-packages/sklearn/metrics/pairwise.py", line 223, in euclidean_distances
    X, Y = check_pairwise_arrays(X, Y)
  File "/usr/local/lib/python3.5/dist-packages/sklearn/metrics/pairwise.py", line 107, in check_pairwise_arrays
    warn_on_dtype=warn_on_dtype, estimator=estimator)
  File "/usr/local/lib/python3.5/dist-packages/sklearn/utils/validation.py", line 451, in check_array
    % (array.ndim, estimator_name))
ValueError: Found array with dim 4. check_pairwise_arrays expected <= 2.

I know that X shoul be <= 2 dimensional in order for fit function to work. So I reshaped X from (1433, 224, 224, 3) to (1433, 150528) using these lines:

N_samples, img_width, img_height, ch = X.shape
X_reshaped = X.reshape(N_samples, -1 )

Then ran it again. This time its error is:

Traceback (most recent call last):
  File "/home/user/project/tSNE_visualization.py", line 162, in <module>
    X_embedded = fit(X_reshaped)
  File "/home/user/project/tSNE_visualization.py", line 92, in fit
    return _tsne(P, degrees_of_freedom, n_samples, X_embedded=X_embedded)
  File "/home/user/project/tSNE_visualization.py", line 98, in _tsne
    params = _gradient_descent(obj_func, params, [P, degrees_of_freedom, n_samples, n_components])
  File "/home/user/project/tSNE_visualization.py", line 134, in _gradient_descent
    error, grad = obj_func(p, *args)
  File "/home/user/project/tSNE_visualization.py", line 116, in _kl_divergence
    grad[i] = np.dot(np.ravel(PQd[i], order='K'), X_embedded[i] - X_embedded)
ValueError: setting an array element with a sequence.

And in PyCharm IDE,

X_embedded = 1e-4 * np.random.mtrand._rand.randn(n_samples, n_components).astype(np.float32) line of the below function warns that it Cannot find reference 'mtrand' in '__init__.py'

def fit(X):
    n_samples = X.shape[0]
    # Compute euclidean distance
    distances = pairwise_distances(X, metric='euclidean', squared=True)
    # Compute joint probabilities p_ij from distances.
    P = _joint_probabilities(distances=distances, desired_perplexity=perplexity, verbose=False)
    # The embedding is initialized with iid samples from Gaussians with standard deviation 1e-4.
    X_embedded = 1e-4 * np.random.mtrand._rand.randn(n_samples, n_components).astype(np.float32)
    # degrees_of_freedom = n_components - 1 comes from
    # "Learning a Parametric Embedding by Preserving Local Structure"
    # Laurens van der Maaten, 2009.
    degrees_of_freedom = max(n_components - 1, 1)
    return _tsne(P, degrees_of_freedom, n_samples, X_embedded=X_embedded)

Thinking that that line didn't make the code as it supposed to be, I looked for the other ways to import mtrand._rand.randn such as

from numpy.random.mtrand import RandomState 

and tried as X_embedded = 1e-4 * RandomState.randn(n_samples, n_components).astype(np.float32)

Still it is giving me the error: TypeError: descriptor 'randn' requires a 'mtrand.RandomState' object but received a 'int' to which I couldn't find any solution yet.

If you have any way to solve this, I'd appreciate it. Thanks.




Aucun commentaire:

Enregistrer un commentaire