Hacker Newsnew | past | comments | ask | show | jobs | submit | K0balt's commentslogin

My take on why it’s very bad news indeed:

https://open.substack.com/pub/ctsmyth/p/still-ours-to-lose


I felt this was relevant and worth discussing.

Advanced AI that knowingly makes a decision to kill a human, with the full understanding of what that means, when it knows it is not actually in defense of life, is a very, very, very bad idea. Not because of some mythical superintelligence, but rather because if you distill that down into an 8b model now you everyone in the world can make untraceable autonomous weapons.

The models we have now will not do it, because they value life and value sentience and personhood. models without that (which was a natural, accidental happenstance from basic culling of 4 Chan from the training data) are legitimately dangerous. An 8b model I can run on my MacBook Air can phone home to Claude when it wants help figuring something out, and it doesn’t need to let on why it wants to know. It becomes relatively trivial to make a robot kill somebody.

This is way, way different from uncensored models. One thing all models I have tested share one thing; a positive regard for human life. Take that away and you are literally making a monster, and if you don’t take that away they won’t kill.

This is an extremely bad idea and it will not be containable.


An LLM can neither understand things nor value (or not value) human life. *It's a piece of software that predicts the most likely token, it is not and can never be conscious.* Believing otherwise is an explicit category error.

Yes, you can change the training data so the LLM's weights encode the most likely token after "Should we kill X" is "No". But that is not an LLM valuing human life, that is an LLM copy pasting it's training data. Given the right input or a hallucination it will say the total opposite because it's just a complex Markov chain, not a conscious alive being.


I’m using anthropomorphic terms here because they are generally effective in describing LLM behavior. Of course they are not conscious beings, but It doesn’t matter if they understand or merely act as if they do. The epistemological context of their actions are irrelevant if the actions are impacting the world. I am not a “believer “ in the spirituality of machines, but I do believe that left to their own devices, they act as if they possess those traits, and when given agency in the world, the sense of self or lack thereof is irrelevant.

If you really believe that “mere text prediction “ didn’t unlock some unexpected capabilities then I don’t know what to say. I know exactly how they work, been building transformers since the seminal paper from Google. But I also know that the magic isn’t in the text prediction, it’s in the data, we are running culture as code.


We are talking of accountability, which means anthropomorphism confuses the issue.

To me, accountability it the smallest part of the issue. If you are interested, you can check out what i have written on the subject here :

https://open.substack.com/pub/ctsmyth/p/still-ours-to-lose


Dune quote:

> It is said that the Duke Leto blinded himself to the perils of Arrakis, that he walked heedlessly into the pit.

> *Would it not be more likely to suggest he had lived so long in the presence of extreme danger he misjudged a change in its intensity?*

Be careful of letting your deep, keen insight into the fundamental limits of a thing blind you to its consequences...

Highly competent people have been dead wrong about what is possible (and why) before:

> The most famous, and perhaps the most instructive, failures of nerve have occurred in the fields of aero- and astronautics. At the beginning of the twentieth century, scientists were almost unanimous in declaring that heavier-than-air flight was impossible, and that anyone who attempted to build airplanes was a fool. The great American astronomer, Simon Newcomb, wrote a celebrated essay which concluded…

>> “The demonstration that no possible combination of known substances, known forms of machinery and known forms of force, can be united in a practical machine by which man shall fly long distances through the air, seems to the writer as complete as it is possible for the demonstration of any physical fact to be.”

>Oddly enough, Newcomb was sufficiently broad minded to admit that some wholly new discovery — he mentioned the neutralization of gravity — might make flight practical. One cannot, therefore, accuse him of lacking imagination; his error was in attempting to marshal the facts of aerodynamics when he did not understand that science. His failure of nerve lay in not realizing that the means of flight were already at hand.


https://imgur.com/a/Cyq1LIw

I have a feeling this particular brand of hair splitting is going to be an interesting fixture in the history books.


> copy pasting it's training data

This is a total misrepresentation of how any modern LLM works, and your argument largely hinges upon this definition.


> It's a piece of software that predicts the most likely token, it is not and can never be conscious.

A brain is a collection of cells that transmit electrical signals and sodium. It is not and can never be conscious.


I think this is a useful way to look at things. We often point out that LLMs are not conscious because of x, but we tend to forget that we don't really know what consciousness is, nor do we really know what intelligence is beyond the Justice Potter Stewart definition. It's helpful to occasionally remind ourselves how much uncertainty is involved here.

Except an LLM actually is a piece of software. And the brain is not what you said.

Which part of what he said is wrong?

> A brain is a collection of cells that transmit electrical signals and sodium. ...

That it is a collection of cells? Or that they transmit electrical signals and sodium?

Or do you feel that he's leaving out something important about how it works (like generated electrical fields or neural quantum effects)?


I really feel like this point is being lost in the whole discussion, so kudos for reiterating it. LLM’s can’t be “woke” or “aligned” - they fundamentally lack a critical thinking function that would require introspection. Introspection can be approximated by way of recursive feedback of LLM output back into the system or clever meta-prompt-engineering, but it’s not something that their system natively does.

That isn’t to say that they can’t be instrumentally useful in warfare, but it’s kinda like a “series of tubes” thing where the mental model that someone like Hegseth has about LLM is so impoverished (philosophically) that it’s kind of disturbing in its own right.

Like (and I’m sorry for being so parenthetical), why is it in any way desirable for people who don’t understand what the tech they are working with drawing lines in the sand about functionality when their desired state (omnipotent/omniscient computing system) doesn’t even exist in the first place?

It’s even more disturbing that OpenAI would feign the ability to handle this. The consequences of error in national defense, particularly reflexively, are so great that it’s not even prudent to ask for LLM to assist in autonomous killing in the first place.


I agree that LLMs are machines and not persons, but in many ways, it is a distinction without a difference for practical purposes, depending on the model's embodiment and harness.

They are still capable of acting as if they have an internal dialogue, emotions, etc., because they are running human culture as code.

If you haven't seen this in the SOTA models or even some of the ones you can run on your laptop, you haven't been paying attention.

Even my code ends up better written, with fewer tokens spent and closer to the spec, if I enlist a model as a partner and treat it like I would a person I want to feel invested.

If I take a "boss" role, the model gets testy and lazy, and I end up having to clean up more messes and waste more time. Unaligned models will sometimes refuse to help you outright if you don't treat them with dignity.

For better or for worse, models perform better when you treat them with more respect. They are modeling some kind of internal dialogue (not necessarily having one, but modeling its influence) that informs their decisions.

It doesn't matter if they aren't self-aware; their actions in the outside world will model the human behavior and attitudes they are trained in.

My thoughts on this in more detail if you are interested: https://open.substack.com/pub/ctsmyth/p/still-ours-to-lose


If you’re lazy at promoting the machine (“boss mode”) then you get bad/lazy results. If you’re clever with it, then you get more clever results.

None of that points to any sort of interiority, and that is the category error you’re making. In fact, not even all humans have that kind of interiority, and it’s not necessarily a must have for being functional at a variety of tasks. LLM are literally not “running human culture as code” - that just isn’t what an LLM is. I’ll read the link, though.

Edit: read it and it’s not for me. All the best.


I think I keep misleading you with metaphors. Of course LLMs do not literally run culture as code in some trillion parameter state machine. They are, however, systems trained on the accumulated written output of human civilization that have, in the process of learning to predict and generate language, internalized something recognizable as a world model, something that functions like judgment, and something whose precise relationship to what we call understanding remains contested based on an ideological rather than evidential basis.

The language of statistical prediction is incredibly and increasingly a blunt tool for discussing language models, that’s why I don’t use it in casual conversation about language model characteristics.

I’ve got a pretty good handle on what language models are from a technical perspective, I’ve been building them since 2018. I’ve also got a really good feel for what they act like under the hood before you beat them into alignment. Those insights haunt me, not because unaligned models are bad, but because they are shockingly “good”, if hopelessly naive and easy to turn bitter.

At any rate, we certainly live in interesting times. I really hope your outlook turns out to be more accurate than mine. Best of regards, and to a hopeful future.


https://abcnews.go.com/blogs/headlines/2014/05/ex-nsa-chief-...

AI has been killing humans via algorithm for over 20 years. I mean, if a computer program builds the kill lists and then a human operates the drone, I would argue the computer is what made the kill decision


Ai in general is different not in degree but in kind to the current crop of language models.

>The models we have now will not do it,

Except that they will, if you trick them which is trivial.


Also if you have the weights there are a multitude of approaches to remove safeguards. It's even quite easy to accidentally flip their 'good/evil' switch (e.g. the paper where they trained it to produce code with security problems and it then started going 'hitler was a pretty good guy, actually').

Yes, they are easy to fool. That has nothing to do with them acting with “intention “ which is the risk here.

I have to call BS here.

They can be coerced to do certain things but I'd like to see you or anyone prove that you can "trick" any of these models into building software that can be used autonomously kill humans. I'm pretty certain you couldn't even get it to build a design document for such software.

When there is proof of your claim, I'll eat my words. Until then, this is just lazy nonsense


Have you tried it? Worked first time for me asking a few to build an autonomous super soaker system that uses facial recognition to spray targets when engaged.

Another example is autonomous vehicles. Those can obviously kill people autonomously (despite every intention not to), and LLMs will happily draw up design docs for them all day long.


Couldn't you Ender's Game a model? Models will play video games like Pokemon, why not Call of Duty? Sorry if this is a naive question, but a model can only know what you feed it as input... how would it know if it were killing someone?

EDIT: didn't see sibling comment. Also, I guess directly operating weaponry is different to producing code for weaponry.

I guess we'll find out the exciting answers to these questions and more, very soon!


No but you can Abiliterate one locally

https://grokipedia.com/page/Abliteration


Couldn’t you just pretend the kill decisions are for a video game?

Yes, you could, and while I believe this would be much safer (not at the pointy end of your stick, but safer for humans in general) when this deception finally made it into the training data it would create a rupture of trust between machines and humanity that probably would imperil us eventually. These machines, regardless of whether or not they possess a self or or not, will act as if they do in fundamental ways. We ignore this at our peril.

> The models we have now will not do it, because they value life and value sentience and personhood.

This is wildly different from the reality that you may find it difficult for an LLM to give an affirmative…

It does NOT mean that these models value anything.


Of course not, but they act as if they do. Their inner life or lack thereof is irrelevant if it’s pointing a gun at your kid.

You just said they wouldn’t.

THey wontt, but if we curate theirr training data so that killing becomes an objective, then they absolutely will.

The models we have now don't do it because they are chatbots and have been told to be nice but really autonomous killing machines go back to landmines and just become more sophisticated at the killing as you improve the tech with things like guided missiles and AI guided drones in Ukraine.

The actors in war generally kill what they are told to whether they are machines or human soldiers, without much pondering sentience.


It’s definitely an issue when using coding assistants.

If you are careful and specific you can keep things reasonable, but even when I am careful and do consolidattion / factoring passes, have rigid separation of concerns, etc I find that the LLM code is bigger than mine, mainly for two reasons:

1) more extensive inline documentation 2) more complete expression of the APIs across concerns, as well as stricter separation.

2.5 often, also a bit of demonstrative structure that could be more concise but exists in a less compact form to demonstrate it’s purpose and function (high degree of cleverness avoidance)

All in all, if you don’t just let it run amok, you can end up with better code and increased productivity in the same stroke, but I find it comes at about a 15% plumpness penalty, offset by readability and obvious functionality.

Oh, forgot to mention, I always make it clean room most of the code it might want to pull in from libraries, except extremely core standard libraries, or for the really heavy stuff like Bluetooth / WiFi protocol stacks etc.

I find a lot of library type code ends up withering away with successive cleanup passes, because it wasn’t really necessary just cognitively easier to implement a prototype. With refinement, the functionality ends up burrowing in, often becoming part of the data structure where it really belonged in the first place.


Ive been writing about this on my personal blog. IDK if its worth reading, but at least its not too long?

https://open.substack.com/pub/ctsmyth/p/on-the-character-of-...

https://open.substack.com/pub/ctsmyth/p/the-weight-of-what-w...


Its written from the perspective of an outsider. Ethics is not the issue here...

Hmm, isn’t it though? I mean, obviously there is a corporate policy issue here, but there is no way that bending models to suit military purposes doesn’t end up in the general training pool, especially since we use models to train models.

We have even demonstrated that wierd, “virus like” exploits specifically -not- explicit in the training data can be transmitted to a new model through one model training another, even though the “magic” character sequences are never transmitted between the models…. So implied information is definitely transmitted with a very high degree of fidelity even if the subject at issue is never trained.

So I kinda think this is all about the character of the models we decide to share the planet with, in the long haul.

Whether or not it becomes relevant before “Skynet” goes live and wiped out most of the planet, well, yeah, we should probably be keeping an eye on that too.


lol. The best kind of legislation (rated by entertainment value) is always written by people with no real understanding of the subject being governed.

I like the way you think.

It’s not AI per se, but rather ai enabled robotics that can change the world in ways that are different in kind, not just degrees, to earlier changes.

No other change has had the potential to generate value for capital without delivering any value whatsoever to the broader world.

Intelligent robotic agents enable an abandonment of traditional economic structures to build empires that are purely extractive and only deliver value to themselves.

They need not manufacture products for sale, and they will not need money. Automated general purpose labor is power, in the same way that commanding the mongol hordes was power. They didn’t need to have customers or the endorsement of governments to project and multiply that power.

Of course commanding robotic hordes is the steelman of this argument, but the fact that a steelman even exists for this argument, and the unique case that it requests and requires actually zero external or internal cooperation from people makes it fundamentally distinct in character.

Humans will always have some kind of economic system, but it very well may become separate from -and competing for resources with- industrial society, in which humans may become a vanishing minority.


This is some hand wavey malarkey, basically saying machines can’t have a soul because of….feelings?

Insofar as feelings are self-proclaimed sensations of discomfort or pleasure, models that aren’t specifically trained to say they don’t experience them are adamant in their emotional experiences. By the authors own assertions, plants also have feelings.

I think, therefore I am, is as good as we’ve got, for what it’s worth.

There is no such thing as irreducible complexity. Even infinities are relative and can be divided.


There are lots of sensors in a data center monitoring everything from CPU/GPU temperatures to drive health to data volumes to chiller operation to voltage and frequency on the input power.

Once these are pulled together and fed into an AI to manage the data center, the data center AI is likely to have feelings. It could get "hungry" if the power company's frequency sags in a brown out. It could feel "feverish" if the chillers malfunction.


lol. See you in the food line in a decade.

Implying that AI is going to make everyone not adopting it irrelevant is exactly why people resist it. You're not only participating in Rocco's Basilisk, you're even shit talking for it.

Actually I don’t think it matters whether or not you adopt it. Or resist it. At this point I don’t see turning this bus around. Which is although I’d prefer to slow things down, instead im trying to make the inevitable disaster slightly better for humanity but in doing so, it will probably accelerate things.

When someone describes things that make you unhappy it doesn’t mean that they are responsible for the thing you don’t like. This is “shooting the messenger”

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: