I found a code snippet that appears to use a dictionary file to search for all possible English words in a random string of letters. For example, if the string were xcganrpgrokp it would find the words can, car and pop. The main point is that I want it to find words by skipping over letters. The words found won't be consecutive.
It appears to partially work. However, it's mostly finding words that don't even have letter in the random string. Since xrange is not longer utilized, I made it range instead. I also commented out the reverse method. So, what could be going wrong?
from random import choice
import string
dictionary = set(open('words.txt','r').read().lower().split())
max_len = max(map(len, dictionary)) #longest word in the set of words
text = ''.join([choice(string.ascii_lowercase) for i in xrange(28000)])
text += '-'+text[::-1] #append the reverse of the text to itself
words_found = set() #set of words found, starts empty
for i in xrange(len(text)): #for each possible starting position in the corpus
chunk = text[i:i+max_len+1] #chunk that is the size of the longest word
for j in xrange(1,len(chunk)+1): #loop to check each possible subchunk
word = chunk[:j] #subchunk
if word in dictionary: #constant time hash lookup if it's in dictionary
words_found.add(word) #add to set of words
print words_found
Aucun commentaire:
Enregistrer un commentaire