jeudi 26 octobre 2023

Usage of bitmask in identifier generator

The popular nanoid package uses a bitmask to force random bytes to fit a given alphabet.

const bytes = [145, 94, 92, 130, 74, 86, 7, 30, 139, 223, 44, 36, 29, 134, 59];
const alphabet = "123456789abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ";

function nid1() {
  const mask = (2 << (31 - Math.clz32((alphabet.length - 1) | 1))) - 1;

  let id = "";

  for (let i = 0; i < bytes.length; i++) {
    id += alphabet[bytes[i] & mask]; // bytes[i] = 59 returns undefined!
  }

  return id;
}

Sometime, the bitmask method overflows the alphabet, they mitigate it by adding || '' after alphabet[bytes[i] & mask] and asking for extra bytes, here is their full function:

function customRandom(alphabet, defaultSize, getRandom) {
  // First, a bitmask is necessary to generate the ID. The bitmask makes bytes
  // values closer to the alphabet size. The bitmask calculates the closest
  // `2^31 - 1` number, which exceeds the alphabet size.
  // For example, the bitmask for the alphabet size 30 is 31 (00011111).
  let mask = (2 << (31 - Math.clz32((alphabet.length - 1) | 1))) - 1
  // Though, the bitmask solution is not perfect since the bytes exceeding
  // the alphabet size are refused. Therefore, to reliably generate the ID,
  // the random bytes redundancy has to be satisfied.

  // Note: every hardware random generator call is performance expensive,
  // because the system call for entropy collection takes a lot of time.
  // So, to avoid additional system calls, extra bytes are requested in advance.

  // Next, a step determines how many random bytes to generate.
  // The number of random bytes gets decided upon the ID size, mask,
  // alphabet size, and magic number 1.6 (using 1.6 peaks at performance
  // according to benchmarks).
  let step = Math.ceil((1.6 * mask * defaultSize) / alphabet.length)

  return (size = defaultSize) => {
    let id = ''
    while (true) {
      let bytes = getRandom(step)
      // A compact alternative for `for (let i = 0; i < step; i++)`.
      let i = step
      while (i--) {
        // Adding `|| ''` refuses a random byte that exceeds the alphabet size.
        id += alphabet[bytes[i] & mask] || ''
        if (id.length === size) return id
      }
    }
  }
}

Why did they not simply used the modulus operator ? Is this only for performance or is there cryptographic implications ?

function nid2() {
  let id = "";

  for (let i = 0; i < bytes.length; i++) {
    id += alphabet[bytes[i] % alphabet.length];
  }

  return id;
}



Aucun commentaire:

Enregistrer un commentaire