When given a structure of one record, how can I randomly simulate n records of the same structure?
Example
Consider that I have an array of records such as:
[
{
"id": 12345,
"createdAt": "2021-12-25",
"data": {
"age": {"value": 25},
"height": {"value": 100},
"weight": {"value": 160},
"n_of_kids": {"value": 0},
"fam_status": {"value": "married"},
"preferred_pet": {"value": "dog"},
"preferred_color": {"value": "purple"},
"preferred_movie": {"value": "titanic"}
}
},
{...} // another record
]
My task: I want to simulate an array of n records of the same structure as the one above.
Note. I specifically want to find a solution that would work for any given structure. So while I'm aware that the structure given here is sub-optimal (e.g., the redundant value
property doesn't add much), I still want to be able to account for any possible given structure.
One way I can approach this is by creating an object whose values are regex that specify what each value should be.
const structureTemplateRegex = {
id: "^[0-9]{5}$", // 5-digit number
createdAt: /^\d{4}\-(0[1-9]|1[012])\-(0[1-9]|[12][0-9]|3[01])$/, // yyyy-mm-dd
data: {
age: { value: "/^(?:[0-9]|[1-9][0-9]|100)$/" }, // 0-100
height: { value: "/^(?:1[0-9]|[2-9][0-9]|1[0-9]{2}|2[01][0-9]|220)$/" }, // 10-220
weight: { value: '/^(?:3[0-9]|[4-9][0-9]|[12][0-9]{2}|300)$/' }, // 30-300
n_of_kids: { value: '/^(0|[1-9][0-9]?|7)$/' }, // 0-7
fam_status: { value: '/^(married|single|divorced|widowed)$/' },
preferred_pet: { value: '/^(dog|cat|hamster|fish|rabbit|zebra)$/' },
preferred_color: { value: '/^(red|green|yellow|black|orange|blue)$/' },
preferred_movie: {
value: '/^(titanic|alien|se7en|batman|goodfellas|argo)$/',
},
},
};
Well, structureTemplateRegex
might be good for validation, but not for generating data. So another way to approach the problem is to write a generator function for each property in the record.
const generateId = (n = 5) => [...Array(n)].map(_=>Math.random()*10|0).join`` // https://stackoverflow.com/a/70598339/6105259
const generateDate = (start = new Date(2018, 8, 9), end = new Date(2021, 12, 15)) => new Date(start.getTime() + Math.random() * (end.getTime() - start.getTime())).toISOString().slice(0,10); // https://stackoverflow.com/a/39472913/6105259
const randomInteger = (min, max) => Math.floor(Math.random() * (max - min + 1)) + min; // https://stackoverflow.com/a/29246176/6105259
const randomElement = (arr) => arr[(Math.random() * arr.length) | 0] // https://stackoverflow.com/a/38448710/6105259
const generateFamStatus = () => randomElement(["married", "single", "divorced", "widowed"])
const generatePet = () => randomElement(["dog", "cat", "hamster", "fish", "rabbit", "zebra"])
const generateColor = () => randomElement(["red", "green", "yellow", "black", "orange", "blue"])
const generateMovie = () => randomElement(["titanic", "alien", "se7en", "batman", "goodfellas", "argo"])
// and then
const structureTemplateGenerators = {
id: generateId(), // 5-digit number
createdAt: generateDate(), // yyyy-mm-dd
data: {
age: { value: randomInteger(0, 101) }, // 0-100
height: { value: randomInteger(10, 221) }, // 10-220
weight: { value: randomInteger(30, 301) }, // 30-300
n_of_kids: { value: randomInteger(0, 8) }, // 0-7
fam_status: { value: generateFamStatus() },
preferred_pet: { value: generatePet() },
preferred_color: { value: generateColor() },
preferred_movie: {
value: generateMovie(),
},
},
};
But I'm not really sure how to proceed down this path. I have the materials I need, but not the technique. Essentially what I want is to call a function that takes as parameters: (1) a structure of one representative record, and (2) n
number of records to simulate. And the function would return an array of length n
with randomly generated records.
// pseudocode
generateRecords(structureTemplateGenerators, 5) // but `n` could potentially be 10 or 10000 or 3e7
// would return
const possibleOutput = [
{
id: 12045,
createdAt: '2021-02-21',
data: {
age: { value: 15 },
height: { value: 80 },
weight: { value: 100 },
n_of_kids: { value: 1 },
fam_status: { value: 'widowed' },
preferred_pet: { value: 'dog' },
preferred_color: { value: 'purple' },
preferred_movie: { value: 'se7en' },
},
},
{
id: 39847,
createdAt: '2020-12-02',
data: {
age: { value: 33 },
height: { value: 56 },
weight: { value: 210 },
n_of_kids: { value: 3 },
fam_status: { value: 'married' },
preferred_pet: { value: 'zebra' },
preferred_color: { value: 'blue' },
preferred_movie: { value: 'argo' },
},
},
{
id: 22435,
createdAt: '2018-10-10',
data: {
age: { value: 25 },
height: { value: 103 },
weight: { value: 165 },
n_of_kids: { value: 5 },
fam_status: { value: 'married' },
preferred_pet: { value: 'dog' },
preferred_color: { value: 'green' },
preferred_movie: { value: 'titanic' },
},
},
{
id: 61194,
createdAt: '2019-04-10',
data: {
age: { value: 20 },
height: { value: 90 },
weight: { value: 100 },
n_of_kids: { value: 3 },
fam_status: { value: 'divorced' },
preferred_pet: { value: 'hamster' },
preferred_color: { value: 'blue' },
preferred_movie: { value: 'batman' },
},
},
{
id: 22231,
createdAt: '2021-10-01',
data: {
age: { value: 77 },
height: { value: 160 },
weight: { value: 69 },
n_of_kids: { value: 1 },
fam_status: { value: 'divorced' },
preferred_pet: { value: 'dog' },
preferred_color: { value: 'red' },
preferred_movie: { value: 'titanic' },
},
},
];
More Context
I have a JavaScript application that accepts an array of records and does a bunch of aggregative calculations over that data. Ultimately, the app returns a JSON
-like output. Knowing the structure of the input records, I want to simulate, randomly, an array containing n records. Such procedure would allow me to test my app both in terms of (1) calculations reliability and (2) speed (as n increases).
Aucun commentaire:
Enregistrer un commentaire