> It does a surprisingly good job of predicting protein function across a diverse set of tasks, including ones structural in nature, like the induction of a single neuron that is able, with some degree of accuracy (ρ = 0.33) to distinguish between α helices and β strands (I suspect the network as a whole is far more performant at this task than the single neuron we’ve identified, but we didn’t push this aspect of the analysis as the problem is well tackled using specialized approaches.)
I hate to be that guy, but distinguishing between alpha helices and beta strands is not really that hard.
It's a good start though. I would propose the following test: Let's see if we can use the activations from the neurons to predict the luminosity of a 'base' GFP molecule (under a fixed set of experimental conditions). Train the set on 10,000 mutations (this could maybe be done in very high throughput by tethering the XNA to a bead, synthesizing, and then measuring the beads one by one), and see if can extrapolate the effects of 10k more, or heck, just by doing it brute-forcedly, we've got high throughput robots, right?
And predicting protein function is not that hard either. The ground truth labels are often determined by sequence alignment similarity, not by experiment. So the results are far from profound
Doing it right is quite hard.
Doing it usefully is even harder [1].
Getting a good training set without to many biases is the really hard part.
Generating a ground truth that is actually a truth is very expensive.
I have to read the paper carefully again. But for the contact point prediction I think the training set will cover most of the data used in the validation. Due to they way PDB "sequences" are distributed over UniParc as well as how PDB 3D structures are generated experimentally. i.e. there are 120,000 pdb related sequences in UniParc, but they cover 45,000 ones in UniProtKB. Because PDB derived sequences are rarely full length, often mutated and highly duplicative in coverage.
[1] predicting the root GO terms will give you and insane TP/FP rate but is completely useless.
I think you are misusing the term sucker in this context. They are getting paid regardless of what happens to other people, that doesn't sound like they are being stupid.