dimanche 22 juillet 2018

What better way to design the built-in stats of a random generator?

Context : a random sentence generator

1) the function generateSentence() generates random sentences returned as strings (works fine)

2) the function calculateStats() outputs the number of unique strings the above function can theoretically generate (works fine also in this mockup, so be sure to read the disclaimer, I don't want to waste your time)

3) the function generateStructure() and the words lists in Dictionnary.lists are constantly growing as time passes

Quick mockup of the main generator function :

function generateSentence() {
  var words = [];
  var structure = generateStructure();

  structure.forEach(function(element) {
    words.push(Dictionnary.getElement(element));
  });

  var fullText = words.join(" ");
  fullText = fullText.substring(0, 1).toUpperCase() + fullText.substring(1);
  fullText += ".";
  return fullText;
}

var Dictionnary = {
  getElement: function(listCode) {
    return randomPick(Dictionnary.lists[listCode]);
  },
  lists: {
    _location: ["here,", "at my cousin's,", "in Antarctica,"],
    _subject: ["some guy", "the teacher", "Godzilla"],
    _vTransitive: ["is eating", "is scolding", "is seeing"],
    _vIntransitive: ["is working", "is sitting", "is yawning"],
    _adverb: ["slowly", "very carfully", "with a passion"],
    _object: ["this chair", "an egg", "the statue of Liberty"],
  }
}

// returns an array of strings symbolizing types of sentence elements
// example : ["_location", "_subject", "_vIntransitive"]
function generateStructure() {
  var str = [];

  if (dice(6) > 5) {// the structure can begin with a location or not
    str.push("_location");
  }

  str.push("_subject");// the subject is mandatory

  // verb can be of either types
  var verbType = randomPick(["_vTransitive", "_vIntransitive"]);
  str.push(verbType);

  if (dice(6) > 5) {// adverb is optional
    str.push("_adverb");
  }

  // the structure needs an object if the verb is transitive
  if (verbType == "_vTransitive") {
    str.push("_object");
  }

  return str;
}

// off-topic warning! don't mind the implementation here,
// just know it's a random pick in the array
function randomPick(sourceArray) {
  return sourceArray[dice(sourceArray.length) - 1];
}

// Same as above, not the point, just know it's a die roll (random integer from 1 to max)
function dice(max) {
  if (max < 1) { return 0; }
  return Math.round((Math.random() * max) + .5);
}

At some point I wanted to know how many different unique strings it could output and I wrote something like (again, very simplified) :

function calculateStats() {// the "broken leg" function I'm trying to improve/replace
  var total = 0;
  // lines above : +1 to account for 'no location' or 'no adverb'
  var nbOfLocations = Dictionnary.lists._location.length + 1;
  var nbOfAdverbs = Dictionnary.lists._adverb.length + 1;

  var nbOfTransitiveSentences = 
    nbOfLocations *
    Dictionnary.lists._vTransitive.length *
    nbOfAdverbs *
    Dictionnary.lists._object.length;
  var nbOfIntransitiveSentences =
    nbOfLocations *
    Dictionnary.lists._vIntransitive.length *
    nbOfAdverbs;

  total = nbOfTransitiveSentences + nbOfIntransitiveSentences;
  return total;
}

(Side note : don't worry about namespace pollution, type checks on input parameters, or these sort of things, this is assumed to be in a bubble for the sake of example clarity.)

Important disclaimer : This is not about fixing the code I posted. This is a mock-up and it works as it is. The real question is "As the complexity of possible structures grow in the future, as well as the size and diversity of the lists, what would be a better strategy to calculate stats for these types of random constructions, rather than my clumsy calculateStats() function, which is hard to maintain, likely to handle astronomically big numbers*, and prone to errors?"

* in real context, the total number has exceeded (10 power 80) for some time already...




Aucun commentaire:

Enregistrer un commentaire