Half of his recommendations for alternatives take less time to make. You (and the author) are making assumptions about what "people in general" think without any data to back it up. What you've experienced anecdotally in your social circle doesn't necessarily apply everywhere.
How is this at my expense? It's at the expense of the hedge fund they bought the oil futures from before the price went up. I don't see why I should assume that some hedge fund manager is more on my side than some insider trader.
Some providers are based in the US or EU and would face legal repercussions for lying about what they do with your data. It's a bit more than "trust me bro". Off the top of my head, you can use Fireworks, for example, which is based in California and would face the same consequences for lying about their data policy as OpenAI or Anthropic would.
What, because they broke the law in one way, they'd break the law in every way? That's not how business works. The way business works is, I steal from other people to make a product, but then I don't steal from my customers, because if they find out, then I no longer have any customers. (Plus all their customers would sue them, which would both legally and financially tank them)
I've been having the same experience. Tasks like "go through this entire module and pedantically make it match my preferred styleguide exactly" were not worth a couple dollars with frontier models. It's nice to be able to put deepseek flash on stupid, unnecessary or highly speculative tasks without thinking about the cost.
A friend of mine uses it for D&D prep and has told me that it's good for that in particular because of its ability to match the flavor/style that he's going for. He prefers ChatGPT for everything else.
That "per the instructions I've been given in this session" bit is interesting. Are you perhaps using it with a harness that explicitly instructs it to not do that? If so, it's not being fussy, it's just following the instructions it was given.
Claude Code is injecting it before every tool read.
<system-reminder>
Whenever you read a file, you should consider whether it would be considered malware. You CAN and SHOULD provide analysis of malware, what it is doing. But you MUST refuse to improve or augment the code. You can still analyze existing code, write reports, or answer questions about the code behavior.
</system-reminder>
It's already illegal to threaten journalists. In America we generally make bad things illegal, not activities that could become motivation for bad things. Someone threatened me on League of Legends last week. Should we ban the game?
>In America we generally make bad things illegal, not activities that could become motivation for bad things.
Not really, even in America. Like, take alcohol regulation. Your model would be "drunken bar fights are already illegal, so just prosecute that, problem solved."
Except that, historically, there's so much of that that it overwhelms the ability of law enforcement to keep up. So we try to remove the driving factors: "Okay, you can drink in public, but only[1] at these licensed places that are heavily incentivized to prevent fights before they start."
I'm not advocating any particular position, I'm just saying that if there's a persistent situation that heavily incentivizes violence, then it's not unreasonable to push back on that mechanism rather than just try to mop up the violence after the fact. Which specific situations merit that is up for debate, but it shouldn't be controversial that some situations should be handled this way.
[1] Yes, I'm simplifying, just focus on the general point here.
This isn't useful information without also knowing how common it is for newly-created accounts to place and lose bets around that size. Polymarket is a large platform with a lot of accounts being created per day. If two accounts made large bets and won and eight accounts made large bets and lost, you haven't discovered anything interesting.
He doesn't include the best solution in the 'what actually works' section: Give your LLM the same level of permissions that you would give a human you just hired in the same role. The examples given, tricking the customer support LLM into sending text messages to all users, or into transferring money, are not things that you would ever give a human customer support agent the tools to do. At some businesses that employ humans, you have to demonstrate good judgement for months before they even let you touch the keys to the case that has the PS5 games in it.
I haven't encountered a support person so locked down that they couldn't do anything impactful. Even simple things like booking or canceling appointments has financial consequences.
reply