As someone who has published research on protein folding, trust me: compiler opt...

jmzachary · on June 20, 2008

Then we are in the same club. I've published research on global optimization algorithms for protein folding. I'm not a biochemist, but I understood that protein folding actually has immediate application in understanding disease and drug design.

As a computer scientist, I also understood that compiler optimization is a mature field with most of the low-hanging fruit already picked. So, I guess I'm confused and will ask respectfully what problems in compiler optimization make it a thousand times more useful than protein folding and associated medical problems?

timr · on June 20, 2008

"I'm not a biochemist, but I understood that protein folding actually has immediate application in understanding disease and drug design....what problems in compiler optimization make it a thousand times more useful than protein folding and associated medical problems?"

Short answer:

Compilers are used for real work, every day. Nobody is using protein structure prediction for anything practical, and they likely won't be for decades more. At this point, it's blue-sky research.

Long answer:

"Immediate application" is one of those bits of academic-speak that really means "is related to", but sounds better to grant review boards. While it's true that protein folding is important (after all, most biological processes are mediated by folded proteins), it's not true that protein structure prediction is important. It would be great if we could predict protein structures accurately, but we can't, and until we can, it's not a practically useful discipline.

Even the very best, crystallographically determined protein structures are barely sufficient to do rational drug design, and predicted structures don't come close to that level of quality. For example: we can sometimes (very rarely) predict very small (<150 residue) protein structures to within 1 angstrom RMSD of their experimentally determined shapes (i.e. >2 angstrom resolution, in the best case). However, the interactions important to drug binding, protein design, etc., don't start until a tenth of that (scales of ~0.1 angstrom).

Throw in the fact that the vast majority of proteins are much larger than 150 angstroms, and that we keep creating cheaper, faster, more automated ways of getting actual experimental information on structure, and the role of protein structure prediction looks increasingly marginalized. It's definitely a cool, fun problem -- just not a very practical one.

For whatever it's worth, my first papers were on applying the state-of-the-art method (you've heard of it...I think you're paraphrasing the lab's PR) for protein structure prediction to genome annotation. To call the approach useful was/is a stretch, and that's for a much easier application than drug design (in fact, we were trying to find a practical application for protein structure prediction, and it was the most likely thing we could think of!)

jmzachary · on June 21, 2008

Thanks for the very detailed answer. Amazingly, I can follow the gist of it after over 10 years.

But, you still don't answer the question. What problems in compiler optimization are more important than problems in protein folding? You seems to indicate that protein folding is a basic science problem and not a "practically useful discipline". In fact, your statement "It would be great if we could predict protein structure accurately, but we can't, and until we can, it's not a practically useful discipline" says that the because the problem isn't solved, it's not important, but it will be important when it's solved. So, trying to solve the problem is important, no?

But, you don't say anything about compiler research, and specifically compiler optimization research and development, which you claimed is much more important. What specific areas in compiler optimization (or just in compiler design) are more important than protein structure prediction and modeling?

timr · on June 21, 2008

I did answer your question, but now you're asking a different one. I have no idea how "important" protein structure prediction will ultimately become; I just know that it's currently pretty useless, and getting worse.

My first comment was that compiler optimization is about a thousand times more useful than protein folding. I stand by that remark. However, at the beginning of my long answer, I mistakenly wrote that protein structure prediction is not important, when I had meant to write that it is not useful: protein folding is an important biological process, but protein structure prediction is not particularly useful, for the reasons I've mentioned.

It's not my place to say which area of research is more important. That's a subjective question, and the answer depends on your value system, your outlook, and your willingness to wait. Obviously, I think that compiler optimization is more useful, because compilers are actually in use today. In 100 years...who knows?

That said, I think you're laboring under the assumption that compiler optimization is a "mature" field, and that it is "solved" (and therefore less important), whereas protein folding is not "solved" (and therefore more important). The thing is, people have been doing protein folding research for at least fifty years -- it is a very mature field, and the low-hanging fruit has been picked. I think that a new researcher is equally likely to make significant gains in either field, but that the potential for practical impact is still much greater in compiler design.

jmzachary · on June 21, 2008

I'm asking the same question, and you're not close to answering it. My question was "what problems in compiler optimization make it a thousand times more useful than protein folding and associated medical problems?" because you authoritatively stated that protein folding work (implicitly computational protein folding work) was much less important/useful (pick one) than compiler optimization work. You keep talking about protein folding research, but you don't say anything about compiler optimization. Was that comparison just an off-hand or self-deprecating remark about protein folding work? I'm not asking which area is more important. And, I know enough about both fields to not get lost in the technical details of your answers. So, I'll ask it again: what problems in compiler optimization make it a thousand times more useful than protein folding and associated medical problems?

timr · on June 21, 2008

If you want to know the most important/useful areas in compiler research, go ask a compiler researcher. I'm not a compiler researcher.

Compilers are used on a daily basis for real work; protein structure prediction is not. For this reason alone, research into compilers is more useful.

gaius · on June 21, 2008

I didn't mean for people to focus on "protein folding", that was just an example of something that scientists do that's computationally intensive.

I find myself in my early 30s knowing an awful lot about database applications, but zero domain knowledge. I can go into any field and implement a spec, but I don't understand any of it. Someone wants a graph of this data in their application, I'll give them a great graph, but I look at it and it's just squiggly lines to me. I just feel like I'm missing something.

aswanson · on June 21, 2008

Wow. This is really enlightening; I thought that the computational method was going to open a new phase in disease treatment, but you seem to say here that the empirical method is on its way to making it useless. So the Pande group at Stanford is wasting their time. Interesting.

timr · on June 21, 2008

"So the Pande group at Stanford is wasting their time."

I wouldn't go quite that far. The research is definitely speculative, but lots of interesting things can come from speculative research. My point is that you don't do research into protein structure prediction with the intent of finding anything useful. It's basic science.

We can (and occasionally do) learn things from computer models of proteins. But the PR in this field has been seriously exaggerating the results of a few of the more prominent researchers. We're a long way from curing diseases or designing drugs with this stuff.

aswanson · on June 21, 2008

Is it the weakness of the modeling or the lack of computational horsepower that limits the research in this area? And would you mind linking to your papers?

timr · on June 21, 2008

That's a matter of debate. Some people think that the problem is search limited, others think that the current models are bad. In my opinion, the bulk of the evidence supports the latter conclusion.

Backchannel me, and I'll be happy to provide you with references to the papers I wrote/helped write. Most of them aren't open access, unfortunately.

aswanson · on June 21, 2008

Will do. And thanks for the info, you may have spared me a decade or so of wasted effort.

timr · on June 21, 2008

Oops...that third paragraph should read:

"the vast majority of proteins are much larger than 150 residues."