lundi 4 décembre 2023

Dummy data: generating random text and numerical data into one csv/excel file?

So I'm trying to generate dummy data that contains 3 columns: sq. feet, price and Borough. For the first two, which are purely numerical this is fine. I have 50,000 rows of data for both on a spreadsheet. However, when I add Borough and specify random values from a list I receive the following output:

       Sq. feet    Price  Borough
0           112   345382        5
1           310   901500        5
2           215   661033        5
3           147  1038431        5
4           212   296497        5

I have not used a package associated with numerical generation like np.random.randint

Instead I used "Borough" : random.randrange(len(word))

Where have I gone wrong?

My code below

import random

import pandas as pd
import numpy as np

WORDS = ["Chelsea", "Kensington", "Westminster", "Pimlico", "Bank", "Holborn", "Camden", "Islington", "Angel", "Battersea", "Knightsbridge", "Bermondsey", "Newham"]
word = random.choice(WORDS)
np.random.seed(1)
data3 = pd.DataFrame({"Sq. feet" : np.random.randint(low=75, high=325, size=50000),
                     "Price" : np.random.randint(low=200000, high=1250000, size=50000),
                      "Borough" : random.randrange(len(word))
                     })

df = pd.DataFrame(data3)
df.to_csv("/Users/thomasmcnally/PycharmProjects/real_estate_dummy_date/realestate.csv", index=False)

print(df)

I'm expecting a random line of word values from the WORDS [], instead the return value is just the number 5. It's obviously meaningless making another module just for the text-based data and printing them in different files.




Aucun commentaire:

Enregistrer un commentaire