This is as meaningful as saying most of the hominids can't count. You can't usefully generalize AI models with the rate of change that exists right now. Any statements/comparisons about AI has to contain specific models and versions, otherwise it's increasingly irrelevant noise.
Every time someone has said "LLMs can't do X", I tried X in GPT 4 and it could do it. They usually try free LLMs like Bard or GPT 3 and assume that the results generalise.
This is as meaningful as saying most of the hominids can't count. You can't usefully generalize AI models with the rate of change that exists right now. Any statements/comparisons about AI has to contain specific models and versions, otherwise it's increasingly irrelevant noise.