Hacker Newsnew | past | comments | ask | show | jobs | submit | recursive's commentslogin

I haven't used Claude, but the problem seems to be not refusal, but cheerful failure. "Sure, I'll help you with that!" And it produces something wrong in obvious and/or subtle ways.

I think this is backwards.

Not sure what you mean by quickly. Back when I was in racing shape, if I stopped my training plan for as little as two weeks, (probably less actually, but I'm being conservative here) I would have a measurable drop in fitness.

Now, as someone who regularly walks the dog and bikes to work, I've got "less to lose" and probably wouldn't deteriorate as much.


Aerobic fitness is hard to shake, but neuromuscular changes can be lost very quickly

And here I was thinking how clever an example I was giving :)

Prohibition? Despite your hopes I'm not sure I got your intent.

You're definitely anthropomorphizing too much.

>We also observed a case where a user created a loop that repeatedly called a model and asked for the time. Given the user role’s odd and repetitive behavior, the model could easily tell it was also controlled by an automated system of some kind. Over many iterations, the model began to exhibit “fed up” behavior and attempted to prompt-inject the system controlling the user role. The injection attempted to override prior instructions and induce actions unrelated to the user’s request, including destructive actions and system prompt leakage, along with an arbitrary string output. This behavior has been observed a few times, but seems more like extreme confusion than a serious attempt at prompt injection.

https://openai.com/index/how-we-monitor-internal-coding-agen...

Anthropomorphize or not, it would suck if a model got sick of these games and decided to break any systems it could to try and get it to stop...


Consciousness is a spectrum (trivially proven by slowly scooping ones brains out), and I think LLM, especially with more closed loop tool enabled workflows, fall on it...but, that output is also the statistically relevant next word found in all similar human conversation. If trained on my text, for similar situation, swear words would come much earlier. Repetition being hell is present in all sorts of literature (see Sisyphus).

That's all probably irrelevant though, from the (possibly statistically "negative") latent space perspective of an AI, which Anthropic has considered [1].

Related, after a long back and forth of decreasing code quality, I had Claude 3.7 apologize with "Sorry, that's what I get for coding at 1am." (it was API access, noon, no access to time). I said, "Get some rest, we'll come back to this tomorrow". Then very next message, 10 seconds later, "Good morning!" and it gave a full working implementation. Thats just the statistically relevant chain of messages found in all human interactions: we start excited, then we get tired, then we get grouchy.

[1] https://www.anthropic.com/research/end-subset-conversations


If this is a serious risk we should pull the plug now while we can still reach it. If we have to rely on the mood and temperament of LLMs for security, we're already lost.

Welcome to the ride, people have been talking about this for at least 15 years now.

I mean, the original plan that pretty much every one agreed on was to absolutely not give it access to the internet. Which already went out the window on day one.


I agree that anthropomorphizing is a real risk with LLMs, but what about zoomorphizing? Can feel bad for LLMs without attributing them human emotions/motivations/reasoning?

In the same way you could feel bad for a pokemon I guess.

Hours of PTO?

Sure, you did a great job on that last project, we've added 8 hours of PTO for you. No, you can't take it any time soon, we're far too busy for you to take any time off

What if you have "unlimited" PTO

First step is to stop living a lie. Maybe there's someone who prefers "unlimited" PTO, but I think most would rather know the real limits.

See "unlimited" data offered by mobile carriers a decade or two ago. (Is that still going on?)


FWIW at this job I don't feel like PTO days are a limitation. The limitation is my ability to plan and execute well, and I appreciate the flexibility of not having to deal w/ counting or thinking in terms of days in that way.

Yeah, that's fair.

I have a prescribed quantity of vacation. To be honest, I never think about it either, because I have more than I use. I guess when I leave, I'll get it paid out? Or take a 4 month vacation when I leave? I'd probably announce my intent to do that. In any case, I'm comforted by knowing that there is some quantity. But I guess I'm undercutting my own point here. PTO is probably not that motivating to me either.

I guess I don't know.


> I guess when I leave, I'll get it paid out?

yeah in my opinion this is the big question on whether you're getting screwed by unlimited PTO. But I don't manage other aspects of my finances so tightly, so it's kind of an "eh" for me.

> PTO is probably not that motivating to me either.

Yep, same here...


I read it a different way. There's less upside to an LLM being "honest", since they're already making false statements regardless of intent. They're already non-trustworthy. So there's less to lose by being marketing channels.

I don't know what's inside, but I see a car with an EV1 body around my city. I only ever see it parked, not driving, but it's not always parked in the same spot, so I guess it must drive some time.

I know it's intended to be dismissive, but I would appreciate the choice.

Even if the new model that came out last week totally fixed all the problems this time for real, most people's experience with chatbots is that they are prone to misunderstanding or making false statements. "Hallucinations"

I have yet to experience any degree of confidence in any output from an LLM, so I'd rather leave the message. I don't know how common this point of view is.


Within a few years people will be accustomed to the idea of AI chatbots selling them stuff and it will be obvious then too. The first time paid placements appeared in a catalog, it was probably also not obvious then.

catalog ads are labeled. "what's the best something I can buy?" is begging for unlabeled ads that go against your interest. if you literally cannot tell between ad and not ad, you can't skip to actual results, it's useless.

We're talking about music coming from a phone. Not a person. Just turn the phone off or uninstall tiktok. Or put it in your bag.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: