vendredi 26 avril 2019

How to split a set of strings into substrings in Python, making shorter substrings more likely?

I have a set of strings which are some millions of characters each. I want to split them into substrings of random length, and this I can do with no particular issue.

However, my question is: how can I apply some sort of weight to the substring length choice? My code runs in python3, so I would like to find a pythonic solution. In detail, my aim is to:

  • split the strings into substrings that range in length between 1*e04 and 8*e06 characters.
  • make it so, that the script chooses more often a short length (1*e04) over a long length (8*e06) for the newly generated substrings, like a descending length likelihood gradient.

Thanks for the help!




Aucun commentaire:

Enregistrer un commentaire