I am initializing two multivariate gaussian distributions like so and trying to implement a machine learning algorithm to draw a decision boundary between the classes:
import numpy as np
import matplotlib.pyplot as plt
import torch
import random
mu0 = [-2,-2]
mu1 = [2, 2]
cov = np.array([[1, 0],[0, 1]])
X = np.random.randn(10,2)
L = np.linalg.cholesky(cov)
Y0 = mu0 + X@L.T
Y1 = mu1 + X@L.T
I have two separated circles and I am trying to stack Y0 and Y1, shuffle them, and then break them into training and testing splits. First I append the class labels to the data, and then stack.
n,m = Y1.shape
class0 = np.zeros((n,1))
class1 = np.ones((n,1))
Y_0 = np.hstack((Y0,class0))
Y_1 = np.hstack((Y1,class1))
data = np.vstack((Y_0,Y_1))
Now when i try to call random.shuffle(data)
the zero class takes over and I get a small number of class one instances.
random.shuffle(data)
Here is my data before shuffling:
print(data)
[[-3.16184428 -1.89491433 0. ]
[ 0.2710061 -1.41000924 0. ]
[-3.50742027 -2.04238337 0. ]
[-1.39966859 -1.57430259 0. ]
[-0.98356629 -3.02299622 0. ]
[-0.49583458 -1.64067853 0. ]
[-2.62577229 -2.32941225 0. ]
[-1.16005269 -2.76429318 0. ]
[-1.88618759 -2.79178253 0. ]
[-1.34790868 -2.10294791 0. ]
[ 0.83815572 2.10508567 1. ]
[ 4.2710061 2.58999076 1. ]
[ 0.49257973 1.95761663 1. ]
[ 2.60033141 2.42569741 1. ]
[ 3.01643371 0.97700378 1. ]
[ 3.50416542 2.35932147 1. ]
[ 1.37422771 1.67058775 1. ]
[ 2.83994731 1.23570682 1. ]
[ 2.11381241 1.20821747 1. ]
[ 2.65209132 1.89705209 1. ]]
and after shufffling:
data
array([[-0.335667 , -0.60826166, 0. ],
[-0.335667 , -0.60826166, 0. ],
[-0.335667 , -0.60826166, 0. ],
[-0.335667 , -0.60826166, 0. ],
[-2.22547604, -1.62833794, 0. ],
[-3.3287687 , -2.37694753, 0. ],
[-3.2915737 , -1.31558952, 0. ],
[-2.23912202, -1.54625136, 0. ],
[-0.335667 , -0.60826166, 0. ],
[-2.23912202, -1.54625136, 0. ],
[-2.11217077, -2.70157476, 0. ],
[-3.25714184, -2.7679462 , 0. ],
[-3.2915737 , -1.31558952, 0. ],
[-2.22547604, -1.62833794, 0. ],
[ 0.73756329, 1.46127708, 1. ],
[ 1.88782923, 1.29842524, 1. ],
[ 1.77452396, 2.37166206, 1. ],
[ 1.77452396, 2.37166206, 1. ],
[ 3.664333 , 3.39173834, 1. ],
[ 3.664333 , 3.39173834, 1. ]])
Why is random.shuffle
deleting my data? I just need all twenty rows to be shuffled, but it is repeating lines and i am losing data. i'm not setting random.shuffle
to a variable and am simply just calling random.shuffle(data)
. Are there any other ways to simply shuffle my data?
Aucun commentaire:
Enregistrer un commentaire