For the first time with GPT4, OpenAI as been able to predict model progress with accuracy:
> A large focus of the GPT-4 project has been building a deep learning stack that scales predictably. The primary reason is that, for very large training runs like GPT-4, it is not feasible to do extensive model-specific tuning. We developed infrastructure and optimization that have very predictable behavior across multiple scales. To verify this scalability, we accurately predicted in advance GPT-4’s final loss on our internal codebase (not part of the training set) by extrapolating from models trained using the same methodology but using 10,000x less compute:
> Now that we can accurately predict the metric we optimize during training (loss), we’re starting to develop methodology to predict more interpretable metrics. For example, we successfully predicted the pass rate on a subset of the HumanEval dataset, extrapolating from models with 1,000x less compute:
> We believe that accurately predicting future machine learning capabilities is an important part of safety that doesn’t get nearly enough attention relative to its potential impact (though we’ve been encouraged by efforts across several institutions). We are scaling up our efforts to develop methods that provide society with better guidance about what to expect from future systems, and we hope this becomes a common goal in the field.
Isn't this all based off self-attestation? There is no comprehensive audit of their research data and finances I am aware of. If I was OpenAI and blew millions of dollars training models that showed exponentially worse performance for incrementally more resources expended training the model, my next step would not be to publish about it.
> Birds are susceptible to a respiratory condition called "teflon toxicity" or "PTFE poisoning/toxicosis." Deaths can result from this condition, which is due to the noxious fumes emitted from overheated cookware coated with polytetrafluoroethylene (PTFE).
Sure, but burning Teflon and inhaling the fumes is much different from eating pieces of Teflon that flake off in your pan. Plenty of otherwise-harmless things are suddenly not harmless when you burn and inhale them.
A type system is for internal consistency though. Facts are about external consistency with real world data. And even then facts are always a social augmentation in that they are always captured in a given social context, and by that I include the lenses of theoretical frameworks and axiomatics. They always have a spin they can lose when considered from another standpoint and at the very minimum they are conditioned by attention and relevancy, and it has everything to do with our current representation of the world and nothing with the world itself.
I don't get your argument about the frame problem. Maybe it's like squeezing a big pillow inside a small bag. A bulge forms that won't fit. It's the frame problem. Turn the pillow around, squeeze it into the bag again: a bulge now forms on the opposite side: it's the hallucination problem. I can see how one could be the solution to the other. Hallucinations as a lack of rules.