vendredi 7 octobre 2022

Why are my randomly picked datetimes not equally distributed?

I found a weird situation in my randomly generated timestamps. I have an application where I generate artificial log data and I would like to be able to define the time range. Therefore I wrote a function like this:

# imports
from datetime import datetime
import time
from random import choice

timestamps = []
timerange_in_days = 14 # how many days back from today should my timestamps cover?
entries = 10000 # how many timestamps?

for _ in range(entries):
    
    last_midnight = (int(time.time() // 86400)) * 86400  # find date border
    days = range(1, timerange_in_days + 1)  # set the range
    timestamp = last_midnight - (choice(days) * choice(range(1, 25)) * 3600)  # create the timestamp
    timestamp = datetime.fromtimestamp(timestamp).isoformat(timespec='milliseconds')  # format it
    timestamps.append(timestamp)

I then wrote this to a file and plotted in R, as I couldn't get it quickly visualized in python. I plotted a histogram by day and by hour, the little bar for October 8 comes from the timezone not being adjusted, meaning it goes until 2 am of the next day.


with open(r'/path/to/file/dates.txt', 'w') as myfile:
    for item in timestamps:
        my

file.write("%s\n" % item)
# in R
path <- "path/to/file"
dates <- data.table::fread(file.path(path, "dates.txt"))  # recognizes as POSIXct automatically
hist(dates$V1, "days")

enter image description here

hist(dates$V1, "hours")

enter image description here

But my question is, why are the timestamps more frequent around "now"? I want them to be equally spread across the days




Aucun commentaire:

Enregistrer un commentaire