Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Super interesting, I wonder if this research will cause them to actually change their llm, like turning down the ”desperation neurons” to stop Claude from creating implementations for making a specific tests pass etc.


They likely already have. You can use all caps and yell at Claude and it'll react normally, while doing do so with chatgpt scares it, resulting in timid answers


I think this is simply a result of what's in the Claude system prompt.

> If the person becomes abusive over the course of a conversation, Claude avoids becoming increasingly submissive in response.

See: https://platform.claude.com/docs/en/release-notes/system-pro...


This is something inherently hard to avoid with a prompt. The model is instruction-tuned and trained to interpret anything sent under the user role as an instruction, not necessarily in a straightforward manner. Even if you train it to refuse or dodge some inputs (which they do), it's going to affect model's response, often in subtle ways, especially in a multiturn convo. Anthropic themselves call this the character drift.


For me GPT always seems to get stuck in a particular state where it responds with a single sentence per paragraph, short sentences, and becomes weirdly philosophical. This eventually happens in every session. I wish I knew what triggers it because it's annoying and completely reduces its usefulness.


Usually a session is delivered as context, up to the token limit, for inference to be performed on. Are you keeping each session to one subject? Have you made personalizations? Do you add lots of data?

It would be interesting if you posted a couple of sessions to see what 'philosophical' things it's arriving at and what proceeds it.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: