samedi 9 octobre 2021

How to generate non-overlapping random points uniformly and evenly within N-dimensional spaces or dataset between low and high range

I have tried to find random points on the NxM dataset based on the lowest value of each M as low range and the highest value of each M on as high range.

Here is the code:

def generate_random_points(dataset, dimension_based=False):
    dimension = dataset.shape[1]
    if dimension_based == False:
        row_size = np.floor((np.sqrt(dimension))).astype(int) if np.floor(np.sqrt(dimension)).astype(int) < np.floor(np.sqrt(dataset.shape[0])).astype(int) else np.floor((np.sqrt(dataset.shape[0]))).astype(int) 
        generated_spikes = np.random.uniform(low=np.min(dataset, axis=0),
                                             high=np.max(dataset, axis=0),
                                             size=(row_size, dimension))
        return generated_spikes
    else:
        row_size = np.floor((np.sqrt(dimension))).astype(int)
        generated_spikes = np.random.uniform(low=np.min(dataset, axis=0),
                                             high=np.max(dataset, axis=0),
                                             size=(row_size, dimension))
        return generated_spikes

But the problem is most of the random points lies on the boundaries or edges of dataset spaces rather than being uniformly and evenly distributed

Here is a plot of one example: random points are black ones

I have also tried doing PCA and then apply the high and low range by doing inverse_transform to the ranges but kind of expectedly, the random points are not distributed uniformly and evenly

def generate_random_points(dataset,dimension_based= False):
    dimension = dataset.shape[1]
    dimension_pca = dataset.shape[0] if dataset.shape[0] < dataset.shape[1] else dataset.shape[1]
    pca, dataset_pca = perform_PCA(dimension_pca, dataset)
    low_pca = np.min(dataset_pca, axis=0)
    high_pca = np.max(dataset_pca, axis=0)
    low = perform_PCA_inverse(pca, low_pca)
    high = perform_PCA_inverse(pca, high_pca)
    if dimension_based == False:
        row_size = np.floor((np.sqrt(dimension))).astype(int) if np.floor(np.sqrt(dimension)).astype(int) < np.floor(np.sqrt(dataset.shape[0])).astype(int) else np.floor((np.sqrt(dataset.shape[0]))).astype(int) 
        generated_spikes = np.random.uniform(low=low,
                                             high=high,
                                             size=(row_size, dimension))
        return generated_spikes
    else:
        row_size = np.floor((np.sqrt(dimension))).astype(int)
        generated_spikes = np.random.uniform(low=np.min(dataset, axis=0),
                                             high=np.max(dataset, axis=0),
                                             size=(row_size, dimension))
        return generated_spikes

How to solve the issue such that the random generated points are more evenly distributed instead of piling up on two edges and also do not overlap?

I need like this:

the red one is the position required for the black points which are crossed

P.S:

  1. Both of the image is a PCA representation of a dataset with shape of (46,2730) i.e. 46 rows and 2730 dimensions

  2. I was thinking of using the 2nd answer of this question : algorithm for generating uniformly distributed random points on the N-sphere But I am not sure how to calculate the radius(R) of an N-dimensional dataset or even if it make sense so that I can use that 2nd answer on the link above.

Please help!




Aucun commentaire:

Enregistrer un commentaire