basically my problem statement is to find the best fit distribution for my data (just suppose i have already extracted a column from dataframe). after finding the best fit distribution of my data i have to generate random numbers .
Heading
import numpy as np
import scipy.stats as st
def bestFitDist(dist_list):
distributions = [st.beta,
st.expon,
st.gamma,
st.lognorm,
st.norm,
st.pearson3,
st.triang,
st.uniform,
st.weibull_min,
st.weibull_max,
st.laplace,
st.exponpow
]
mles = []
for distribution in distributions:
pars = distribution.fit(dist_list)
mle = distribution.nnlf(pars, dist_list)
mles.append(mle)
results = [(distribution.name, mle) for distribution, mle in zip(distributions, mles)]
best_fit = sorted(zip(distributions, mles), key=lambda d: d[1])[0]
#print ('Best fit reached using {}, MLE value: {}'.format(best_fit[0].name, best_fit[1]))
return best_fit[1]
this function i have written to find the best fit distribution i m not getting how to generate radom number based on the return value of this function
matlab code for this problem is something like : (just ignore (isMonth & sensorData.isLoad & isValid) and Pratio is a column for i have to find best distribution and then generate random values (rPratio)
NSEED=10000;
[D, PD] = allfitdist(Pratio(isMonth & sensorData.isLoad & isValid), 'AIC');
ksd_Pratio = PD{1};
rPratio = random(ksd_Pratio,NSEED,1);
i hav to convert this logic into pyspark
Aucun commentaire:
Enregistrer un commentaire