I have tried to find random points on the NxM dataset based on the lowest value of each M as low range and the highest value of each M on as high range.
Here is the code:
def generate_random_points(dataset, dimension_based=False):
dimension = dataset.shape[1]
if dimension_based == False:
row_size = np.floor((np.sqrt(dimension))).astype(int) if np.floor(np.sqrt(dimension)).astype(int) < np.floor(np.sqrt(dataset.shape[0])).astype(int) else np.floor((np.sqrt(dataset.shape[0]))).astype(int)
generated_spikes = np.random.uniform(low=np.min(dataset, axis=0),
high=np.max(dataset, axis=0),
size=(row_size, dimension))
return generated_spikes
else:
row_size = np.floor((np.sqrt(dimension))).astype(int)
generated_spikes = np.random.uniform(low=np.min(dataset, axis=0),
high=np.max(dataset, axis=0),
size=(row_size, dimension))
return generated_spikes
But the problem is most of the random points lies on the boundaries or edges of dataset spaces rather than being uniformly and evenly distributed
Here is a plot of one example: random points are black ones
I have also tried doing PCA and then apply the high and low range by doing inverse_transform to the ranges but kind of expectedly, the random points are not distributed uniformly and evenly
def generate_random_points(dataset,dimension_based= False):
dimension = dataset.shape[1]
dimension_pca = dataset.shape[0] if dataset.shape[0] < dataset.shape[1] else dataset.shape[1]
pca, dataset_pca = perform_PCA(dimension_pca, dataset)
low_pca = np.min(dataset_pca, axis=0)
high_pca = np.max(dataset_pca, axis=0)
low = perform_PCA_inverse(pca, low_pca)
high = perform_PCA_inverse(pca, high_pca)
if dimension_based == False:
row_size = np.floor((np.sqrt(dimension))).astype(int) if np.floor(np.sqrt(dimension)).astype(int) < np.floor(np.sqrt(dataset.shape[0])).astype(int) else np.floor((np.sqrt(dataset.shape[0]))).astype(int)
generated_spikes = np.random.uniform(low=low,
high=high,
size=(row_size, dimension))
return generated_spikes
else:
row_size = np.floor((np.sqrt(dimension))).astype(int)
generated_spikes = np.random.uniform(low=np.min(dataset, axis=0),
high=np.max(dataset, axis=0),
size=(row_size, dimension))
return generated_spikes
How to solve the issue such that the random generated points are more evenly distributed instead of piling up on two edges and also do not overlap?
I need like this:
the red one is the position required for the black points which are crossed
P.S:
-
Both of the image is a PCA representation of a dataset with shape of (46,2730) i.e. 46 rows and 2730 dimensions
-
I was thinking of using the 2nd answer of this question : algorithm for generating uniformly distributed random points on the N-sphere But I am not sure how to calculate the radius(R) of an N-dimensional dataset or even if it make sense so that I can use that 2nd answer on the link above.
Please help!
Aucun commentaire:
Enregistrer un commentaire