Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This was the subject of my honours thesis[1] back in 2000. I was actually inferring DTDs from sample XML documents, but it's the exact problem as DTDs use regular expressions and I only had positive examples to go on.

Based on existing methods my solution started with the same trie and then generalised to a more flexible DFA by merging states. I used information theory (specifically Minimum Message Length) to turn it into an optimisation problem and tried a few different algorithms, in the addition of Ant Colony Optimisation to an existing algorithm produced the best results for my tests. (They were pretty limited, though.)

[1] http://www.cse.unsw.edu.au/~wong/Papers/jason.pdf



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: