samedi 30 juin 2018

Shuffle rows of a dataframe in pandas python brings about different regression results?

I am trying to randomise my rows in the dataframe - data before applying linear regression, but i realised the regression results differs after the rows are randomised which shouldn't be the case? Codes which i have tried using:

Without row randomisation: 
data 
X = data[feature_col]
y = data['median_price']
lr = LinearRegression()
lr.fit(X, y)

With row randomisation: 
Method 1: 
data = data.sample(frac=1)

Method 2:
data = data.sample(frac=1, axis=1)

Method 3: 
from sklearn.utils import shuffle
data = shuffle(data)

Method 4: 
data = data.sample(frac=1, axis=1)

Out of the 4 row randomisation methods i have tried, only Method 4 gives the same results as the one where no randomisation is applied. I thought row randomisation does not affects the regression results in any case?




Aucun commentaire:

Enregistrer un commentaire