The best read for something like this is The Practice of Programming which is a great little book to have in general. This link to a sample chapter covers the entire analysis and creation of a markov chain program, none of which require many lines of code.
One thing I've noticed from playing with these types of programs is the number of words to use as a hash. Two to start with of course, will quickly reproduce the sample text once you get to only five or six prefix words. Where as two prefix words usually generate nonsense, the sweet spot to a believable quality is only three or four words as a prefix with five or more reproducing the original text. The larger the varied sample text, the much better the results. Furthermore, only breaking words on whitespace creates even better quality output than assuming you need to tinker with the punctuation.
I know "popularized" is often used colloquially as a synonym for "introduced", but I think I'm well within the accepted meaning of the word if I use it to mean "made popular with or accessible to a larger audience". In fact that meaning is closer to the dictionary definition (and the etymology) than the colloquial sense of "introduced".
https://ptgmedia.pearsoncmg.com/images/9780201615869/samplep...
One thing I've noticed from playing with these types of programs is the number of words to use as a hash. Two to start with of course, will quickly reproduce the sample text once you get to only five or six prefix words. Where as two prefix words usually generate nonsense, the sweet spot to a believable quality is only three or four words as a prefix with five or more reproducing the original text. The larger the varied sample text, the much better the results. Furthermore, only breaking words on whitespace creates even better quality output than assuming you need to tinker with the punctuation.