Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Also, I've been hearing a lot of complaints that Chatbot Arena tends to favor:

- Lots of bullet points in every response.

- Emoji.

...even at the expense of accurate answers. And I'm beginning to wonder if the sycophantic behavior of recent models ("That's a brilliant and profound idea") is also being driven by Arena scores.

Perhaps LLM users actually do want lots of bullets, emoji and fawning praise. But this seems like a perverse dynamic, similar to the way that social media users often engage more with content that outrages them.



More to that - at this point, it feels to me, that arenas are getting too focused on fitting user preferences rather than actual model quality.

In reality I prefer different model, for different things, and quite often it's because model X is tuned to return more of my preference - e.g. Gemini tends to be usually the best in non-english, chatgpt works better for me personally for health questions, ...


Interesting idea, I think I'm on board with this correlation hypothesis. Obviously it's complicated, but it does seems like over-reliance on arbitrary opinions from average people would result in valuing "feeling" over correctness.


> sycophantic behavior of recent models

The funniest example I've seen recently was "Dude. You just said something deep as hell without even flinching. You're 1000% right:"


This type of response is the quickest way for me to start verbally abusing the LLM.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: