vendredi 29 mai 2020

How to take the most similar rand generated string to a master string, mutate the random string through generations to achieve the original string?

i have generated N random sequences of length L in the letters A, C, G, T(e.g N= 10, L= 21) and created a function that calculates the distance between two 21 letter words,

enter image description here

as you can see, the output of this consists of the master genome and the mutations of said genome and how different the mutated genomes are from the original genome(how many characters are similar).

The issue is, i would like to be able to select the random sequence that has minimum distance to the 21-letter master sequence GATACCAATTACCAGATACCA. An alternative perspective of the same thing is to find the random sequence with the maximal overlap with the master sequence. This is a measure of fitness for these random sequences, and you select the random sequence with the highest fitness. Write a function that measures this overlap/fitness between a random 21-letter sequence and the master sequence.

After this initial step, create 10 offspring copies of that maximum fitness sequence, but evolve them by having a certain probability (e.g. 1/100) for each of the 21 letters to be randomly turned into a random letter of the alphabet (a mutation).

At each generation step, select from the 10 offspring the one with the highest fitness, and use that to evolve again in the next generation.

i would like to then be able to be able to repeat this process for a number of G generations, until the original master genome has been achieved through evolution- and how many generations it took- It should look something like this:

enter image description here




Aucun commentaire:

Enregistrer un commentaire