dimanche 24 mars 2019

How to sample from CDF in Python

Could you please advise how to sample from distribution with define probabilities (or better solution of my problem below):

I wrote simple script to test myself in other language. Inputs are pair of words in 2 languages. So I receive a question (one word in Eng) and provide answer (in other language and perform checks). To pick the word I use simple rand_word = randint(0, overall), where overall is number of words. But I would like the later (more recently added to my csv) words to appear more often. I can write few conditions, but probably there is much simpler solution to sample from words. For example: I have 3000 word pairs. I would like words from first 20% of space to appear with probability of 10% or 20%, and last 20% of words with 50-60% of prob. This is just example. It could be linear solution, but I want to avoid situation, when first words appear super rarely and latest too often. Thus left tail and right tail preferable should be adjustable.




Aucun commentaire:

Enregistrer un commentaire