I need to solve a sentence generation problem in python. Given keywords which describe my image and I need to generate a long multi-sentence description. In more detail:
Given (not everything must be used):
- list of ~100 keywords that strongly describe an image
- list of ~120 keywords that are somewhat related to that image, but don't describe anything crucial about it
- In addition to this rough prioritization, I have more detailed prioritization inside of those groups of words
- list of ~150 keywords that clearly don't fit the image
- I have about a million of low quality example sentences, where about 10% of them contain some kind of error (grammar, spelling, word order, cut off etc.)
- list of default words that are always allowed in sentences (
the, and, or, in, a, at, by, is...
)
Constraints:
- I need to maximize word variety. Words are not allowed to be used repeatedly, but only once (except
the, and, or, in, a, at, is...
). This means once a word was used, it is not available for generating the next sentence. (I guess this could be solved by updating the list of allowed words each time a new sentence is generated) - My keywords are prioritized which means, higher priority keywords should have higher probability to occur in initial sentences.
Freedoms:
-
if really necessary, words can be turned into different forms:
love -> loving -> loves, decoration -> decorative, blob -> blobs, is -> be
-
I am aware that keywords like
["mouse", "elephant", "fear"]
can result in"Mouse fears elephant"
or"Elephant fears mouse"
. I will throw those sentences away by hand: It would be nice to detect automatically which one is more probable, but it's NOT necessary (and count of google search results might probably help I think) -
Not all words must be used. It's ok if some words a left for which there is little possibility too make a correct sentence.
I took a peek at TextBlob but I'm not sure if it's the right tool to achieve my goal. I don't want to waste time learning something that turns out to be useless. I also found some information on markov chain sentence generators, but I'm not sure if they are powerful enough or if I could use them in combination with something else.
Does anyone have past experience or knows how to generate such random sentences?
I hope I described my problem sufficiently. If there is something I forgot to mention, I will add that as discussion progresses.
Aucun commentaire:
Enregistrer un commentaire