Hacker Newsnew | past | comments | ask | show | jobs | submit | tepal's commentslogin

Or OpenFold, which is the more literal reproduction of AlphaFold 2: https://github.com/aqlaboratory/openfold


Time for an OpenFold3? Or would it be an OpenFold2?


This blog post seems to anticipate this happening: https://moalquraishi.wordpress.com/2019/04/01/the-future-of-...


> It does a surprisingly good job of predicting protein function across a diverse set of tasks, including ones structural in nature, like the induction of a single neuron that is able, with some degree of accuracy (ρ = 0.33) to distinguish between α helices and β strands (I suspect the network as a whole is far more performant at this task than the single neuron we’ve identified, but we didn’t push this aspect of the analysis as the problem is well tackled using specialized approaches.)

I hate to be that guy, but distinguishing between alpha helices and beta strands is not really that hard.

It's a good start though. I would propose the following test: Let's see if we can use the activations from the neurons to predict the luminosity of a 'base' GFP molecule (under a fixed set of experimental conditions). Train the set on 10,000 mutations (this could maybe be done in very high throughput by tethering the XNA to a bead, synthesizing, and then measuring the beads one by one), and see if can extrapolate the effects of 10k more, or heck, just by doing it brute-forcedly, we've got high throughput robots, right?


And predicting protein function is not that hard either. The ground truth labels are often determined by sequence alignment similarity, not by experiment. So the results are far from profound


Doing it right is quite hard. Doing it usefully is even harder [1]. Getting a good training set without to many biases is the really hard part. Generating a ground truth that is actually a truth is very expensive.

I have to read the paper carefully again. But for the contact point prediction I think the training set will cover most of the data used in the validation. Due to they way PDB "sequences" are distributed over UniParc as well as how PDB 3D structures are generated experimentally. i.e. there are 120,000 pdb related sequences in UniParc, but they cover 45,000 ones in UniProtKB. Because PDB derived sequences are rarely full length, often mutated and highly duplicative in coverage.

[1] predicting the root GO terms will give you and insane TP/FP rate but is completely useless.


please sign this petition to stop the tramway from the Grand Canyon. http://grandcanyontrust.nonprofitsoapbox.com/escalade


I think those suckers in congress should not be paid when they shut down the government.


I think you are misusing the term sucker in this context. They are getting paid regardless of what happens to other people, that doesn't sound like they are being stupid.

The suckers are those that vote red vs blue.


Need more signatures! come on people!


Please sign the two petitions asking the White House to hold the prosecution accountable.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: