dimanche 25 novembre 2018

How to Generate large dataset and randomizeusing python DataFrame

I have written a program that will Generate large data set and randomize it according to conditions Please Go through my whole program and conditions which i will write here if any thing which is not clear for you please ping me...

Input data:

Asset_Id  Asset Family  Asset Name  Location    Asset Component          Keywords                       
 1     Haptic Analy     HAL1        Zenoa       Tetris Measuring Unit    Measurement Inaccuracy,    
 1     Haptic Analy     HAL2        Zenoa       Micro Pressure Platform  Low air pressure,                      
 1     Haptic Analy     HAL3        Technopolis Rotation Chamber         Rotation Chamber Intermittent Slowdown
 1     Haptic Analy     HAL4        Technopolis Mirror Lens Combinator   Mirror Angle Skewed,           
 2     Hyperdome Insp   HISP1       Technopolis Laser Column             Column Alignment Issues,       
 2     Hyperdome Insp   HISP2       Zenoa       Turbo Quantifier         Quantifier output Drops Intermittently 
 2     Hyperdome Insp   HISP3       Technopolis Generator                Generator          
 2     Hyperdome Insp   HISP4       Zenoa       High Frequency Emulator  Emulator Frequency Drop            

 3     Nano Dial Assem  NDA11       Zenoa       Fusion Tank              Fall in Diffusion Ratio            
 3     Nano Dial Assem  NDA12       Zenoa       Dial Loading Unit        Faulty Scanner Unit            
 3     Nano Dial Assem  NDA13       Zenoa       Vaccum Line Control      Above Normal 
 3     Nano Dial Assem  NDA14       Zenoa       Wave Generator           Generator Power Failure
 4     Geometric Synth  GeoSyn22    La Puente   Scanning Electronic      Faulty Scanner Unit            
 4     Geometric Synth  GeoSyn23    La Puente   Draft Synthesis Chamber  Beam offset beyond Tolerance       
 4     Geometric Synth  GeoSyn24    La Puente   Progeometric Plane       Progeometric Plane Fault Detected  
 4     Geometric Synth  GeoSyn25    La Puente   Ion Gas Diffuser         Column Alignment Issues    

CONDITIONS: 1) Data should be read csv file and randomize whole data. 2) It should also randomize "Location" column separately and print along with all randomize data. 3) Data should be generate more than 30k rows from given data. 4) Important- It should also read a "Asset Component" separately and randomize it as the value of the "Haptic Analyser" column- "Asset Family" will not mix with the value "Hyperdome Inspector" and "Nano Dial Assembler" and so on.. its means that It should be randomize column in a way that values of the "Asset Family" column should not match with the other values... If any doubt related with 4th condition please let me know..

For this i have written a program which will satisfy all the three conditions

import pandas as pd
import numpy as np
import random
import csv

def main():

    df=pd.read_csv("C:\\Users\\rahul\\Desktop\\Data Manufacturing - Seed Data.csv")
    ds = (df.sample(frac=1))
#     print(ds)

    loc=df.Location
    # Here we are deleting location column and store it in loc variable
    df=df.drop("Location",1)

    # This way we can randomise location column
    randValue = (loc.sample(frac=1))

    randValue = randValue.to_frame()

    #Now we will join the column randValue with whole data
    result=ds.join(randValue, how='left', lsuffix='_left', rsuffix='')

#     cols = list(result.columns.values)
#     print("cols-",cols)

    result = result[['Asset_Id ', 'Asset Family', 'Asset Name', 'Location', 'Asset Component','Keywords','Conditions','Parts','No. of Parts','SR_Id','SR_Date','SR_Month','SR_Year']]

    #Now randomise the whole data again
    ds1 = (result.sample(frac=1))
#     print(ds1)

    # Generating Large dataSet and randomize it
    dd=ds1.append([ds1]*500)
    ds2 = (dd.sample(frac=1))
    print(ds2)
    ds1.to_csv('C:\\Users\\rahul\\Desktop\\people1.csv')


if __name__ == '__main__':
    main()

This program will generate large dataSet and randomize it and also randomize the Column "Location" But only thing i'm not able to do the 4th condition which will be randomize but according to the data which is in other column "Asset Family" values of "Haptic Analyser" and "Hyperdome Inspector" of "Asset Component " should not mix each other and print separately.

The output data:

Asset_Id   Asset Family     Asset Name  Location    Asset Component     Keywords
3         Nano Dial Assem   NDA11       Zenoa       Fusion Tank     Fall in Diffusion Ratio         
1         Haptic Analy      HAL3        Technopolis Rotation Chamber    Rotation Chamber Intermittent Slowdown      
2         Hyperdome Insp    HISP2       Zenoa       Turbo Quantifier    Quantifier output Drops Intermittently  
4         Geometric Synth   GeoSyn25    La Puente   Ion Gas Diffuser    Column Alignment Issues         
1         Haptic Analy      HAL1        Zenoa       Tetris Measuring Unit   Measurement Inaccuracy,         
2         Hyperdome Insp    HISP1       Technopolis Laser Column        Column Alignment Issues,        
3         Nano Dial Assem   NDA14       Zenoa       Wave Generator      Generator Power Failure         
4         Geometric Synth   GeoSyn24    La Puente   Progeometric Plane  Progeometric Plane Fault Detected   

In this output all three conditions is given only 4th condition i'm able to do please help me to get it.. thanks in advance

Note : please go through my all conditions before coming to my coding part please if you are not able to understand any thing or any point please text in a comment box..thanks




Aucun commentaire:

Enregistrer un commentaire