vendredi 18 décembre 2020

Generating random samples from a dataset and matching it with an evaluator function

I have written a function where I am trying to randomly select 303 data points from a total of 506 data points. After selecting 303 points randomly I am randomly extracting 203 data points. Using this technique I want to create 30 samples from a dataset (boston dataset).

Please take this example to get more clarity on the above procedure:

assume we have 10 data points [1,2,3,4,5,6,7,8,9,10], first we take 6 data points randomly , consider we have selected [4, 5, 7, 8, 9, 3] now we will replicate 4 points from [4, 5, 7, 8, 9, 3], consider they are [5, 8, 3,7] so our final sample will be [4, 5, 7, 8, 9, 3, 5, 8, 3,7].

I have written following code for this purpose

def generating_samples(input_data, target_data):

    selecting_rows = np.random.choice(len(input_data), 303)
    replacing_rows = np.random.choice(selecting_rows,203, replace=False)
    selecting_columns = np.random.choice(3,13,1)
    sample_data = input_data[selecting_rows[:,None],selecting_columns]
    target_of_sample_data = target_data[selecting_rows]

    #replicating data
    replicated_sample_data = input_data[replacing_rows]
    target_of_replicated_sample_data = target_data[replacing_rows]

    #concatenating data
    final_sample_data = np.vstack((sample_data, replicated_sample_data))
    final_target_data = np.vstack((target_of_sample_data.reshape(-1,1), target_of_replicated_sample_data.reshape(-1,1)))

    return final_sample_data , final_target_data, selecting_rows,selecting_columns

The below is the grader function which can be used to evaluate this code

def grader_samples(a,b,c,d):
    length = (len(a)==506  and len(b)==506)
    sampled = (len(a)-len(set([str(i) for i in a]))==203)
    rows_length = (len(c)==303)
    column_length= (len(d)>=3)
    assert(length and sampled and rows_length and column_length)
    return True

Here I am getting True for a, b, c and d. But I am getting assertion error because of

sampled = (len(a)-len(set([str(i) for i in a]))==203)

the above statement

The value should match 203 but its not matching. Can someone help me with this issue.




Aucun commentaire:

Enregistrer un commentaire