I am working on a project where I want to read a large text file, randomly select a full sentence from that file. If that file sentence is less 280 characters or less, print that file out. if not select another sentence until it finds a sentence that is less than 280 characters. Using nltk I am able to break down the text into indivdual sentences
import nltk.data
import random
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
fp = open("test.txt")
data = fp.read()
f= open("newData.txt","w+")
newData = '\n\n'.join(tokenizer.tokenize(data))
f.write(newData)
f.close()
I have then tried using random and readlines() to select a sentence but that is giving me only lines as opposed to sentences
line = random.choice(open('newData.txt').readlines())
print line
But that is just printing an actual line as opposed to the full sentence.
For the characters, I think I might have to use len() but still working it out.
Any suggestions would be great.
Aucun commentaire:
Enregistrer un commentaire