mardi 14 mars 2017

Sampling of San Francisco crime data on the basis of category

I am working on San Francisco crime data here I want to divide my data set

in five sub dataset. Help me in sampling. What method of sampling I have to use for better prediction? I want to know different way available for sampling. What are the other way to do sampling of data.

These are the category frequency in train dataset. Category | Freq ------ | ------ TREA |6
------ | ------ PORNOGRAPHY/OBSCENE MAT| 22 ------ | ------ GAMBLING |146 ------ | ------ SEX OFFENSES NON FORCIBLE| 148 ------ | ------ BRIBERY 289 ------ | ------ BAD CHECKS 406 ------ | ------ FAMILY OFFENSES |491 ------ | ------ SUICIDE| 508 ------ | ------ EMBEZZLEMENT |1166 ------ | ------ LOITERING |1225 ------ | ------ ARSON |1513 ------ | ------ LIQUOR LAWS |1903 ------ | ------ DRIVING UNDER THE INFLUENCE |2268 ------ | ------ KIDNAPPING |2341 ------ | ------ RECOVERED VEHICLE |3138 ------ | ------ DRUNKENNESS|4280 ------ | ------ DISORDERLY CONDUCT |4320 ------ | ------ SEX OFFENSES FORCIBLE |4388 ------ | ------ STOLEN PROPERTY |4540 ------ | ------ TRESPASS |7326 ------ | ------ PROSTITUTION |7484 ------ | ------ WEAPON LAWS |8555 ------ | ------ SECONDARY CODES |9985 ------ | ------ FORGERY/COUNTERFEITING |10609 ------ | ------ FRAUD |16679 ------ | ------ ROBBERY |23000 ------ | ------ MISSING PERSON |25989 ------ | ------ SUSPICIOUS OCC |31414 ------ | ------ BURGLARY |36755 ------ | ------ WARRANTS |42214 ------ | ------ VANDALISM |44725 ------ | ------ VEHICLE THEFT |53781 ------ | ------ DRUG/NARCOTIC |53971 ------ | ------ ASSAULT |76876 ------ | ------ NON-CRIMINAL |92304 ------ | ------ OTHER OFFENSES |126182 ------ | ------ LARCENY/THEFT |174900 ------ | ------

PS i am using R




Aucun commentaire:

Enregistrer un commentaire