vendredi 4 décembre 2020

Take a random sample of data points from the original data python

I have wrote the MNIST digits data that look like:

digits = datasets.load_digits(n_class=6)
X = digits.data
y = digits.target

n_samples, n_features = X.shape
n_neighbors = 30


# visualizing input data
a = 20
img = np.zeros((10 * a, 10 * a))
for i in range(a):
    ix = 10 * i + 1
    for j in range(a):
        iy = 10 * j + 1
        img[ix:ix + 8, iy:iy + 8] = X[i * a + j].reshape((8, 8))

plt.figure(figsize=(5,5))
plt.imshow(img, cmap=plt.cm.binary);
plt.xticks([]);
plt.yticks([]);

# compute time
ctime = np.zeros(8)

# set file name
f_file = 'ctime.csv'

What I want to do now is to vary the number of input data points for each fitting and take a random sample of data points from the original data. I would like to do this for 21 sample sizes that are equividistant in a logarithimic scale (base 10) between the end points 100 to 1000 (1083 is the total number of points in the MNIST dataset) up to rounding to integers.

I guess I have to use random.sample() function, but I have no idea how to approach the original data. Any advise or hint would be really helpful.

Also I wrote the following code for the sample sizes that are equividistant in a logarithimic scale.

N = 10
x1 = np.logspace(100, 1000, N, endpoint=True)

I would like to know what method do I have to use to apply these commands to my MNIST data as this is my first time working with this type of dataset.




Aucun commentaire:

Enregistrer un commentaire