mercredi 4 novembre 2015

Using a skewed distribution to simulate a participant's "error" in a cognitive task

I am working on a simulation that is able to reproduce people's performance on a cognitive task. The task is to provide an estimate for the time some object is displayed on a screen.

What data I have is, the mean error of their responses, the standard deviation of their error, the skewness of the data and the percent error of their estimates.

The way I am simulating their performance is by randomly providing the simulator with a "time" value which corresponds to the true amount of time the object would remain on screen in an experiment.

How I am simulating their performance is by multiplying that true time value by a sample from a distribution that is made up of their mean error, and the standard deviation of that error. This effectively gives a reproduction of their "estimate".

Here is the code I currently have this works almost 100% to what I need, but there's a catch.

import random
import numpy
import csv


A = [2502,4376,6255] #the two pools of time (in miliseconds) duration an object will actually remain on the screen
B = [3753,6572,9374]


def time_and_number(pnum, dots, trials):

data = list(csv.reader(open('workingdurationavgdata.csv', 'rb'))) #gutted helper function that pulls the relevant data from a CSV but these values could be anything.
ratio_avg = float(data[pnum-1][dots-1]) #mean error
ratio_std = float(data[pnum-1][dots+3]) #standard deviation of error
ideal_ratio = float(data[pnum-1][dots+7]) #the partipant's 'true' percent error of their estimates gathered experimentally this is used as a comparison to see if the simulation is accurately reproducing performance

estlist = [] #list of generated 'estimates'
errorlist = [] #list of errors
for i in range(trials):  #This randomly chooses between which time pool (above) will be chosen to submit a random entry from it
    poolchoice = numpy.random.randint(1,2)
    if poolchoice == 1:
        pool = A
    elif poolchoice == 2:
        pool = B

    time = random.choice(pool) #gives the simulator a random time from the selected pool
    estimate = time * numpy.random.normal(ratio_avg, ratio_std) #'errors' the true value by multiplying it by a value from a distribution that was generated using mean and standard deviation 
    percent_error = (abs((estimate - time ))/time) * 100 #percent error of this estimate
    estlist.append(estimate) #creating a list of our estimates
    errorlist.append(percent_error) #creating a list of percent errors

estimateavg = sum(estlist)/float(len(estlist)) #average estimate
erroravg = sum(errorlist) / float(len(errorlist)) #average error
return erroravg/ideal_ratio #comparing our average error to the one found experimentally as close to 1 as possible is the goal

What this is doing is using a normal distribution to generate simulated estimates of a participant's performance based on their error.

The issue is that this normal distribution provided by numpy is too inflexible. The data we have will not quite fit, and as such we will expect a systematic overestimation of error.

What I need, is a comparable function to this, but where I am able to more flexibly provide parameters like skewness to get a better fit to the data.

Fundamentally I need a function or a way to make a function that can take in:

A mean, a standard deviation, and a skew value, and sample a value from that distribution to be multiplied by a time value. This simulates a person making an estimate. OR: a better theoretical distribution for doing this accurately but which will still rely on the mean and standard deviation as parameters.

Since you don't have access to the data, I can provide some sample numbers if you want to run this on your own to see what it's doing:

ratio_avg = 0.838986407552044
ratio_std = 0.226132603313837
ideal_ratio = 24.814422079321

I'd be happy to provide any more clarification if it's needed, thank you to anyone who considers helping.




Aucun commentaire:

Enregistrer un commentaire