vendredi 16 septembre 2022

random.shuffle erasing items and not shuffling properly

I am initializing two multivariate gaussian distributions like so and trying to implement a machine learning algorithm to draw a decision boundary between the classes:

import numpy as np
import matplotlib.pyplot as plt
import torch
import random

mu0 = [-2,-2]
mu1 = [2, 2]
cov = np.array([[1, 0],[0, 1]]) 
X = np.random.randn(10,2)
L = np.linalg.cholesky(cov)
Y0 = mu0 + X@L.T 
Y1 = mu1 + X@L.T

I have two separated circles and I am trying to stack Y0 and Y1, shuffle them, and then break them into training and testing splits. First I append the class labels to the data, and then stack.

n,m = Y1.shape
class0 = np.zeros((n,1))
class1 = np.ones((n,1))
Y_0 = np.hstack((Y0,class0))
Y_1 = np.hstack((Y1,class1))

data = np.vstack((Y_0,Y_1))

Now when i try to call random.shuffle(data) the zero class takes over and I get a small number of class one instances.

random.shuffle(data)

Here is my data before shuffling:

print(data)
[[-3.16184428 -1.89491433  0.        ]
 [ 0.2710061  -1.41000924  0.        ]
 [-3.50742027 -2.04238337  0.        ]
 [-1.39966859 -1.57430259  0.        ]
 [-0.98356629 -3.02299622  0.        ]
 [-0.49583458 -1.64067853  0.        ]
 [-2.62577229 -2.32941225  0.        ]
 [-1.16005269 -2.76429318  0.        ]
 [-1.88618759 -2.79178253  0.        ]
 [-1.34790868 -2.10294791  0.        ]
 [ 0.83815572  2.10508567  1.        ]
 [ 4.2710061   2.58999076  1.        ]
 [ 0.49257973  1.95761663  1.        ]
 [ 2.60033141  2.42569741  1.        ]
 [ 3.01643371  0.97700378  1.        ]
 [ 3.50416542  2.35932147  1.        ]
 [ 1.37422771  1.67058775  1.        ]
 [ 2.83994731  1.23570682  1.        ]
 [ 2.11381241  1.20821747  1.        ]
 [ 2.65209132  1.89705209  1.        ]]

and after shufffling:

data
array([[-0.335667  , -0.60826166,  0.        ],
       [-0.335667  , -0.60826166,  0.        ],
       [-0.335667  , -0.60826166,  0.        ],
       [-0.335667  , -0.60826166,  0.        ],
       [-2.22547604, -1.62833794,  0.        ],
       [-3.3287687 , -2.37694753,  0.        ],
       [-3.2915737 , -1.31558952,  0.        ],
       [-2.23912202, -1.54625136,  0.        ],
       [-0.335667  , -0.60826166,  0.        ],
       [-2.23912202, -1.54625136,  0.        ],
       [-2.11217077, -2.70157476,  0.        ],
       [-3.25714184, -2.7679462 ,  0.        ],
       [-3.2915737 , -1.31558952,  0.        ],
       [-2.22547604, -1.62833794,  0.        ],
       [ 0.73756329,  1.46127708,  1.        ],
       [ 1.88782923,  1.29842524,  1.        ],
       [ 1.77452396,  2.37166206,  1.        ],
       [ 1.77452396,  2.37166206,  1.        ],
       [ 3.664333  ,  3.39173834,  1.        ],
       [ 3.664333  ,  3.39173834,  1.        ]])

Why is random.shuffle deleting my data? I just need all twenty rows to be shuffled, but it is repeating lines and i am losing data. i'm not setting random.shuffle to a variable and am simply just calling random.shuffle(data). Are there any other ways to simply shuffle my data?




Aucun commentaire:

Enregistrer un commentaire