This was the subject of my honours thesis[1] back in 2000. I was actually inferring DTDs from sample XML documents, but it's the exact problem as DTDs use regular expressions and I only had positive examples to go on.
Based on existing methods my solution started with the same trie and then generalised to a more flexible DFA by merging states. I used information theory (specifically Minimum Message Length) to turn it into an optimisation problem and tried a few different algorithms, in the addition of Ant Colony Optimisation to an existing algorithm produced the best results for my tests. (They were pretty limited, though.)
Based on existing methods my solution started with the same trie and then generalised to a more flexible DFA by merging states. I used information theory (specifically Minimum Message Length) to turn it into an optimisation problem and tried a few different algorithms, in the addition of Ant Colony Optimisation to an existing algorithm produced the best results for my tests. (They were pretty limited, though.)
[1] http://www.cse.unsw.edu.au/~wong/Papers/jason.pdf