So I'm trying to generate dummy data that contains 3 columns: sq. feet, price and Borough. For the first two, which are purely numerical this is fine. I have 50,000 rows of data for both on a spreadsheet. However, when I add Borough and specify random values from a list I receive the following output:
Sq. feet Price Borough
0 112 345382 5
1 310 901500 5
2 215 661033 5
3 147 1038431 5
4 212 296497 5
I have not used a package associated with numerical generation like np.random.randint
Instead I used "Borough" : random.randrange(len(word))
Where have I gone wrong?
My code below
import random
import pandas as pd
import numpy as np
WORDS = ["Chelsea", "Kensington", "Westminster", "Pimlico", "Bank", "Holborn", "Camden", "Islington", "Angel", "Battersea", "Knightsbridge", "Bermondsey", "Newham"]
word = random.choice(WORDS)
np.random.seed(1)
data3 = pd.DataFrame({"Sq. feet" : np.random.randint(low=75, high=325, size=50000),
"Price" : np.random.randint(low=200000, high=1250000, size=50000),
"Borough" : random.randrange(len(word))
})
df = pd.DataFrame(data3)
df.to_csv("/Users/thomasmcnally/PycharmProjects/real_estate_dummy_date/realestate.csv", index=False)
print(df)
I'm expecting a random line of word values from the WORDS [], instead the return value is just the number 5. It's obviously meaningless making another module just for the text-based data and printing them in different files.
Aucun commentaire:
Enregistrer un commentaire