Tag Archives: markov

Shakespeare Sonnet Sourced Markov Text Generation

And play the mother’s part, kiss me, be kind;
So will I pray that thou mayst prove me.
Weary with toil, I haste me thence?
Till I return, of posting is no remedy,
It is the time with thoughts of love as oft as thou shalt find
Those children nursed, deliver’d from thy heart,
And take thou my love shall in these black lines be seen,
And they shall live, and he stole that word
From thy behaviour; beauty doth he give,
And found such fair assistance in my will no fair acceptance shine?
The sea, all water, yet receives rain still,
And then thou hast her it is built anew,
Grows fairer than at first, more strong, greater.


Number of key tuples: 14916
Mean Choices: 1.174175
Min Choices: 1
Max Choices: 27
Sum of Squares: 13581.492357
Standard Deviation: 116.539660

I have had a vague “todo” in my head for a while to do something with Markov text generation. One of the things that I look forward to when I scan through my spam folders is finding interesting text that was obviously generated. I was lazily browsing through some of my news feeds and following links when I came across this post titled Markov and You by Jeff Atwood. I especially liked the Garkov reference he used to illustrate the usage. I decided I would try to do something similar and found a basic markov implementation in python at this Usware blog post.

After playing with the ”quick brown fox” implementation and pulling in some CNN articles I decided to do something a bit more interesting and downloaded Shakespeare’s Sonnets from Project Gutenberg.

I modified my generator implementation to have output based on the number of lines emitted, and added a reseed function to select a new start tuple in the event that no keys matched my existing search tuple. Finally, I added some simple output statistics to get a feel for how the constructed database looked when I tweaked the chain length parameter. Having shorter chain lengths tended to make the text too random, while longer ones pulled in too much of an existing sonnet sequence. Watching the number of key tuples and the summary statistics of the choices helped me tune the code for this corpus.

The code is available here.