I wouldn't trust any of these benchmarks unless they are accompanied by some sort of proof other than "trust me bro". Also not including the parameters the models were run at (especially the other models) makes it hard to form fair comparisons. They need to publish, at minimum, the code and runner used to complete the benchmarks and logs.
Not including the Chinese models is also obviously done to make it appear like they aren't as cooked as they really are.
The problem with this is context. Whatever examples you provide compete with whatever content you want actually analyzed. If the problem is sufficiently complex, you quickly will run out of context space. You must also describe your response, in what you want. For many applications, it's better to fine-tune.
It's really easy to overcome that -- just sponsor some IndieDevs to flood the internet with scripts and tools to migrate all your conversations from OpenAI. Make it easy for people to switch using a simple process, make sure it's well distributed, and BOOM! Watch their user count drop like a rock. People act like just because a service has a lot of users it can't be destroyed. Anyone who has ever worked at a large web company can tell you otherwise. These things can be destroyed in a just a few days if they are targeted.
They look like fortresses from the outside, but they are all incredibly vulnerable. That's the truth they don't want people to know or realize just how vulnerable they all are.
I keep hearing people say "but as humans we actually understand". What evidence do you have of the material differences in what understanding an LLM has, and what version a human has? What processes do we fundamentally do, that an LLM does not or cannot do? What here is the definition of "understanding", that, presumably an LLM does not currently do, that humans do?
Well a material difference is we don’t input/output in tokens I guess. We have a concept of gaps and limits to knowledge, we have factors like ego, preservation, ambition that go into our thoughts where LLM just has raw data. Understanding the implication of a code change is having an idea of a desired structure, some idea of where you want to head to and how that meshes together. LLM has zero of any of that. Just because it can copy the output of the result of those factors I mention doesn’t mean they operate the same.
It's not an easy thing to box up and ship, unfortunately. Also, another problem (and another incentive to strip the hardware and stuff the column with diodes and capacitors) is that it has seen a lot of salt air exposure from a nearby reef tank sump. In short, you don't want this particular one.
You can only change the rules, you can never stop The Game™. Now, more than ever before, it's faster and easier to create something and deliver its perceived value at scale. Nerds used to rule the roost of tech because they were willing to invest the time and toil in obscurity. Now that's no longer the case. The only skill you need to have now is sales and showmanship. A chatbot can do the rest.
reply