lundi 11 novembre 2019

Selecting a random sentence less than 280 characters from a text file

I am working on a project where I want to read a large text file, randomly select a full sentence from that file. If that file sentence is less 280 characters or less, print that file out. if not select another sentence until it finds a sentence that is less than 280 characters. Using nltk I am able to break down the text into indivdual sentences

import nltk.data
import random

tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')

fp = open("test.txt")

data = fp.read()


f= open("newData.txt","w+")

newData = '\n\n'.join(tokenizer.tokenize(data))

f.write(newData)

f.close() 

I have then tried using random and readlines() to select a sentence but that is giving me only lines as opposed to sentences

line = random.choice(open('newData.txt').readlines())
print line 

But that is just printing an actual line as opposed to the full sentence.

For the characters, I think I might have to use len() but still working it out.

Any suggestions would be great.




Aucun commentaire:

Enregistrer un commentaire