Hacker Newsnew | past | comments | ask | show | jobs | submit | prodigycorp's commentslogin

I hereby propose we rename the HN frontpage to "Claude Customer Support"

Hey, it's Jack/Bob/Jill/Jane from Claude here! We don't give a shit about your issue. Have a nice day.

i run qwen 3.6. you need to drink some settle down juice.

No way it's awesome.

Am I the only one who thinks that your stop hook is written extremely poorly? Not only that, but you're writing to the LLM like an abusive human. No wonder it wants to go home.

Thank you for sharing this. Not often do you get projects that expand your imagination about what we can do with these models.

To me the opus flamingo is waaaay better than the qwen one. qwen has the better pelican, though.


He's contributed to improving trust on this forum. And idk if you've ever gotten commercial samples before but he's right, the amount they give you is immaterial to hurting them and can be material to you.

Also, he said that he's somehow stumbled into somehow having a commercial bottling license. If him, why not us?


> the amount they give you is immaterial to hurting them and can be material to you.

It’s immaterial to Apple if I stole an iPhone from the Apple Store too.

> Also, he said that he's somehow stumbled into somehow having a commercial bottling license. If him, why not us?

Because he is a commercial bottler who is sampling from different supplier that he intends to source from for his business.


I don't think people have been keeping track but OpenAI has been hiring a murderer's row of developers for their Codex team.


Meh, Gehrig and Ruth are long since dead.


Comments trashing this are rightly correct skeptics who remember the benchmaxxing of llama 4. This model was out in the woods as early as like a couple months ago but they didn't release it because it was at gemini 2.5 pro levels.


> 4. This model was out in the woods as early as like a couple months ago but they didn't release it because it was at gemini 2.5 pro levels.

Source? (Even if rumor)


NYTimes had a story about this (March 12):

> Meta’s new foundational A.I. model, which the company has been working on for months, has fallen short of the performance of leading A.I. models from rivals like Google, OpenAI and Anthropic on internal tests for reasoning, coding and writing, said the people, who were not authorized to speak publicly about confidential matters.

> The model, code-named Avocado, outperformed Meta’s previous A.I. model and did better than Google’s Gemini 2.5 model from March, two of the people said. But it has not performed as strongly as Gemini 3.0 from November, they said.

> They added that the leaders of Meta’s A.I. division had instead discussed temporarily licensing Gemini to power the company’s A.I. products, though no decisions have been reached.

https://www.nytimes.com/2026/03/12/technology/meta-avocado-a...

https://archive.is/uUV5h#selection-715.98-715.277


[flagged]


If you are trying to come up with anti-media conspiracies there are always plenty of ways to do it against any media company.

The idea that NY Times is particularly anti-Meta seems a stretch. They - like most traditional media companies - are anti-tech in general. The fact they also collect data doesn't make their reporting untrue.

Personally I think a much more interesting rumor to make up would be that Yann Lecun (who famously had his reporting lines rearranged to go through Alexander Wang after Scale.ai acquihire) works at New York University.

New York University is in the same place as the New York Times.

There's a conspiracy for you. I made it up, but I mean it could be true I guess?

(Of course Lecun also publicly congratulated Wang on the launch of the model. But maybe that's a ruse to hide everything.. blah blah)


>They - like most traditional media companies - are anti-tech in general. The fact they also collect data doesn't make their reporting untrue.

(sigh) In olden times you would have been free to use the em dash as you pleased. Unfortunately, now it's considered signal that you're an AI bot.


Readers here can't fathom that the NYT has inherent bias in a lot of its reporting


Does Meta not harvest data on a massive scale? Not sure what exactly is the issue with doing a series on that.


So llama4 is great? Have you been using it?


It was from a techmeme ride home podcast where the host discussed "sources at the company said". I don't remember which day's episode it was.


The llama4 series was one of the earliest large MoE's to be made publically available. People just ignored it because they were focused on running smaller and denser models at the time, we should know better these days.


Deepseek R1 was a publically-available, MoE model that was getting a ton of attention before llama4. Llama4 didn't get much attention because it wasn't good.


Also, Gemini 2.5 Pro launched a week before Llama 4.

It was Gemini 2.5 Pro that redeemed Google in the eyes of most people as a valid competitor to OpenAI instead of as a joke, so Meta dropping the ball with Llama 4 was extra bad.


the models were objectively horrible


They really weren't horrible. They were ~gpt4o, with the added benefit that you could run them on premise. Just "regular" models, non "thinking". Inefficient architecture (number of active out of total) but otherwise "decent" models. They got trashed online by bots and chinese shills (I was online that weekend when it happened, it's something to behold). Just because they were non-thinking when thinking was clearly the future doesn't make them horrible. Not SotA by any means, but still.


> They were ~gpt4o, with the added benefit that you could run them on premise.

No, they are bad models. They were benchmaxxed on LMAreana and a few other benchmarks but as soon as you try them yourself they fall to pieces.

I have my own agentic benchmark[1] I use to compare models.

Llama-4-scout-17b-16e scores 14/25, while llama-4-maverick-17b-128e scores 12/25.

By comparison gemma-4-E4B-it-GGUF:Q4_K_M scores 15/25 (that is a 4B parameter model!) - even GPT3.5 scores 13/25 (with some adjustment because it doesn't do tool calling).

Llama 4 was a bad model, unfortunately.

[1] https://sql-benchmark.nicklothian.com/#all-data


> By comparison gemma-4-E4B-it-GGUF:Q4_K_M scores 15/25 (that is a 4B parameter model!)

Gemma 4 E4B is slightly confusingly named, its a 8B param model


You are completely right on both counts.

It is a 8B model, and it is confusingly named. In fact I made exactly the same point[1] when it was released and promptly forgot!

[1] https://news.ycombinator.com/item?id=47622694


Wrote longer comment steel-manning this, posted it to a reply, then realized you might like to know they had a reasoning model on deck ready for release in the next 2-4 weeks.

Got shitcanned due to bad PR & Zuck God-King terraforming the org, so there'd be a year delay to next release.

Real tragi-comedy, and you have no idea how happy it makes me to see someone in the wild saying this. It sounds so bizarre to people given the conventional wisdom, but, it's what happened.


Nah I remember how disgusted I felt trying llama 4 maverick and scout. They were both DOA.. couldn't even beat much smaller local models.


I'll cosign what you said, simultaneously, yr interlocutor's point is also well-founded and it depresses me it's not better known and sounds so...off...due to conventional wisdom x God King Zuck's misunderstanding his own company and resulting overreaction.

They beat Gemini 2.5 Flash and Pro handily on my benchmark suite. (tl;dr: tool calling and agentic coding).

Llama 4 on Groq was ~GPT 4.1 on the benchmark at ~50% the cost.

They shouldn't have released it on a Saturday.

They should have spent a month with it in private prerelease, working with providers.[1]

The rushed launch and ensuing quality issues got rolled into the hypebeast narrative of "DeepSeek will take over the world"

I bet it was super fucking annoying to talk to due to LMArena maxxing.

[1] my understanding is longest heads up was single-digit days, if any. Most modellers have arrived at 2+ weeks now, there's a lot between spitting out logits and parsing and delivering a response.


Your comments seem to imply the engineers made a great product but Zuck intervened so now it's shit


I don't know how Zuck intervening could change float32s in a trained model, so I don't think I think that, but maybe I'm parsing your words incorrectly.


failing non-stop at tool calls on top of that.


Thanks for calling me a bot. Llama4 and meta ai sucks


The way you put it, I understand it less. lol


So the answer is: no. lol. Remember Llama 4 Behemoth, and how we were supposed to get more great models from it?


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: