vendredi 6 mai 2022

How to randomly simulate an array of records based on a representative record and property-specific generator functions?

When given a structure of one record, how can I randomly simulate n records of the same structure?

Example

Consider that I have an array of records such as:

[
    {
        "id": 12345,
        "createdAt": "2021-12-25",
        "data": {
            "age": {"value": 25},
            "height": {"value": 100},
            "weight": {"value": 160},
            "n_of_kids": {"value": 0},
            "fam_status": {"value": "married"},
            "preferred_pet": {"value": "dog"},
            "preferred_color": {"value": "purple"},
            "preferred_movie": {"value": "titanic"}
        }
    },
    {...} // another record
]

My task: I want to simulate an array of n records of the same structure as the one above.

Note. I specifically want to find a solution that would work for any given structure. So while I'm aware that the structure given here is sub-optimal (e.g., the redundant value property doesn't add much), I still want to be able to account for any possible given structure.


One way I can approach this is by creating an object whose values are regex that specify what each value should be.

const structureTemplateRegex = {
  id: "^[0-9]{5}$", // 5-digit number
  createdAt: /^\d{4}\-(0[1-9]|1[012])\-(0[1-9]|[12][0-9]|3[01])$/, // yyyy-mm-dd
  data: { 
    age: { value: "/^(?:[0-9]|[1-9][0-9]|100)$/" }, // 0-100
    height: { value: "/^(?:1[0-9]|[2-9][0-9]|1[0-9]{2}|2[01][0-9]|220)$/" }, // 10-220
    weight: { value: '/^(?:3[0-9]|[4-9][0-9]|[12][0-9]{2}|300)$/' }, // 30-300
    n_of_kids: { value: '/^(0|[1-9][0-9]?|7)$/' }, // 0-7
    fam_status: { value: '/^(married|single|divorced|widowed)$/' },
    preferred_pet: { value: '/^(dog|cat|hamster|fish|rabbit|zebra)$/' },
    preferred_color: { value: '/^(red|green|yellow|black|orange|blue)$/' },
    preferred_movie: {
      value: '/^(titanic|alien|se7en|batman|goodfellas|argo)$/',
    },
  },
};

Well, structureTemplateRegex might be good for validation, but not for generating data. So another way to approach the problem is to write a generator function for each property in the record.

const generateId = (n = 5) => [...Array(n)].map(_=>Math.random()*10|0).join`` // https://stackoverflow.com/a/70598339/6105259
const generateDate = (start = new Date(2018, 8, 9), end = new Date(2021, 12, 15)) => new Date(start.getTime() + Math.random() * (end.getTime() - start.getTime())).toISOString().slice(0,10); // https://stackoverflow.com/a/39472913/6105259
const randomInteger = (min, max) => Math.floor(Math.random() * (max - min + 1)) + min; // https://stackoverflow.com/a/29246176/6105259
const randomElement = (arr) => arr[(Math.random() * arr.length) | 0] // https://stackoverflow.com/a/38448710/6105259
const generateFamStatus = () => randomElement(["married", "single", "divorced", "widowed"])
const generatePet = () => randomElement(["dog", "cat", "hamster", "fish", "rabbit", "zebra"])
const generateColor = () => randomElement(["red", "green", "yellow", "black", "orange", "blue"])
const generateMovie = () => randomElement(["titanic", "alien", "se7en", "batman", "goodfellas", "argo"])

// and then
const structureTemplateGenerators = {
    id: generateId(), // 5-digit number
    createdAt: generateDate(), // yyyy-mm-dd
    data: { 
      age: { value: randomInteger(0, 101) }, // 0-100
      height: { value: randomInteger(10, 221) }, // 10-220
      weight: { value: randomInteger(30, 301) }, // 30-300
      n_of_kids: { value: randomInteger(0, 8) }, // 0-7
      fam_status: { value: generateFamStatus() },
      preferred_pet: { value: generatePet() },
      preferred_color: { value: generateColor() },
      preferred_movie: {
        value: generateMovie(),
      },
    },
  };

But I'm not really sure how to proceed down this path. I have the materials I need, but not the technique. Essentially what I want is to call a function that takes as parameters: (1) a structure of one representative record, and (2) n number of records to simulate. And the function would return an array of length n with randomly generated records.

// pseudocode
generateRecords(structureTemplateGenerators, 5) // but `n` could potentially be 10 or 10000 or 3e7

// would return
const possibleOutput = [
  {
    id: 12045,
    createdAt: '2021-02-21',
    data: {
      age: { value: 15 },
      height: { value: 80 },
      weight: { value: 100 },
      n_of_kids: { value: 1 },
      fam_status: { value: 'widowed' },
      preferred_pet: { value: 'dog' },
      preferred_color: { value: 'purple' },
      preferred_movie: { value: 'se7en' },
    },
  },
  {
    id: 39847,
    createdAt: '2020-12-02',
    data: {
      age: { value: 33 },
      height: { value: 56 },
      weight: { value: 210 },
      n_of_kids: { value: 3 },
      fam_status: { value: 'married' },
      preferred_pet: { value: 'zebra' },
      preferred_color: { value: 'blue' },
      preferred_movie: { value: 'argo' },
    },
  },
  {
    id: 22435,
    createdAt: '2018-10-10',
    data: {
      age: { value: 25 },
      height: { value: 103 },
      weight: { value: 165 },
      n_of_kids: { value: 5 },
      fam_status: { value: 'married' },
      preferred_pet: { value: 'dog' },
      preferred_color: { value: 'green' },
      preferred_movie: { value: 'titanic' },
    },
  },
  {
    id: 61194,
    createdAt: '2019-04-10',
    data: {
      age: { value: 20 },
      height: { value: 90 },
      weight: { value: 100 },
      n_of_kids: { value: 3 },
      fam_status: { value: 'divorced' },
      preferred_pet: { value: 'hamster' },
      preferred_color: { value: 'blue' },
      preferred_movie: { value: 'batman' },
    },
  },
  {
    id: 22231,
    createdAt: '2021-10-01',
    data: {
      age: { value: 77 },
      height: { value: 160 },
      weight: { value: 69 },
      n_of_kids: { value: 1 },
      fam_status: { value: 'divorced' },
      preferred_pet: { value: 'dog' },
      preferred_color: { value: 'red' },
      preferred_movie: { value: 'titanic' },
    },
  },
];

More Context

I have a JavaScript application that accepts an array of records and does a bunch of aggregative calculations over that data. Ultimately, the app returns a JSON-like output. Knowing the structure of the input records, I want to simulate, randomly, an array containing n records. Such procedure would allow me to test my app both in terms of (1) calculations reliability and (2) speed (as n increases).




Aucun commentaire:

Enregistrer un commentaire