dimanche 24 janvier 2021

Finding English words in a random string

I found a code snippet that appears to use a dictionary file to search for all possible English words in a random string of letters. For example, if the string were xcganrpgrokp it would find the words can, car and pop. The main point is that I want it to find words by skipping over letters. The words found won't be consecutive.

It appears to partially work. However, it's mostly finding words that don't even have letter in the random string. Since xrange is not longer utilized, I made it range instead. I also commented out the reverse method. So, what could be going wrong?

from random import choice
import string

dictionary = set(open('words.txt','r').read().lower().split())
max_len = max(map(len, dictionary)) #longest word in the set of words

text = ''.join([choice(string.ascii_lowercase) for i in xrange(28000)])
text += '-'+text[::-1] #append the reverse of the text to itself

words_found = set() #set of words found, starts empty
for i in xrange(len(text)): #for each possible starting position in the corpus
    chunk = text[i:i+max_len+1] #chunk that is the size of the longest word
    for j in xrange(1,len(chunk)+1): #loop to check each possible subchunk
        word = chunk[:j] #subchunk
        if word in dictionary: #constant time hash lookup if it's in dictionary
            words_found.add(word) #add to set of words

print words_found



Aucun commentaire:

Enregistrer un commentaire