More

pants2 · 2026-05-02T17:19:05 1777742345

According to Artificial Analysis, Grok 4.3[1] is faster, smarter, cheaper, and uses fewer tokens than DS4. So why aren't we talking about Grok?

1. https://artificialanalysis.ai/models/grok-4-3

pants2 · 2026-05-01T20:41:34 1777668094

This might be why agentic development/vibe coding leads to more burn out. It's been a long time since I've truly been 'stuck' on a problem and needed to sleep on it to figure out the answer. Now I just ask Claude to fix it until it's fixed...

orky56 · 2026-05-01T23:01:02 1777676462

Sounds like polyphasic sleeping might re-emerge as the lifestyle solution. Instead of waiting for agents to complete, you should sleep on the response so when you arise you have the optimized prompt ready to go and a reset on your energy to prevent the burnout.

sillysaurusx · 2026-05-02T02:16:01 1777688161

Amusingly this is an almost-exact description of how I work on my current project, sharc. I'm porting Arc to Common Lisp, and implementing as many HN features as I can. I've been documenting as I go with handoffs: https://github.com/shawwn/sharc/tree/main/docs/agents/handof... (Also thanks partly to dang, who is kind enough to find time to answer an email here and there about their current Arc stack.)

At one point I was working so hard that Claude actually suggested, all on its own, that I should get some sleep.

krashidov · 2026-05-01T21:13:05 1777669985

FWIW I've had the opposite experience. Whenever I work late the output is absolute garbage. If I work past midnight it takes me 3 hours to get done what would have taken me 30 mins in the morning, and with way less frustration and stress. Your inputs to the LLM are only as good as how fresh your mind is so I've made it a rule to not work past midnight (unless there's an emergency).

In the good old days you would reach flow and actually know when you're too tired to continue. Now you can just say "please just fix it" over and over again and get yourself in a slophole much easier.

card_zero · 2026-05-02T07:26:44 1777706804

Feasibly you could do this work while asleep, the lifestyle of the future.

malux85 · 2026-05-01T20:59:04 1777669144

Then you're not challenging yourself with hard enough problems (those include the set of problems Claude cannot solve)

pants2 · 2026-05-01T21:08:21 1777669701

Most software doesn't really have "hard enough problems" unless you're working in deep tech. The majority of SWEs are probably working on some sort of SaaS which isn't super challenging for a model like Opus 4.7. Most of the problems I face are on the product side, which I do need to take time to think through, but it's not as challenging as debugging in the good old days.

Toutouxc · 2026-05-01T21:30:41 1777671041

How do you go from SaaS to “not super challenging”? The part of a SaaS product that I’m working on uses graph algorithms to work with what’s essentially an interactive form. There’s some mildly university-level computer science stuff and it’s mixed with enough domain expertise that Opus 4.7 is still unable to make even small changes without breaking everything or going against the architecture.

So far I’m not that impressed.

vovavili · 2026-05-01T22:56:25 1777676185

Are you guys hiring?

pants2 · 2026-04-30T04:57:03 1777525023

Maybe the only solution to GPTisms is infinite context. If I'm talking to my coworker every day I would consciously recognize when I already used a metaphor recently and switch it up. However if my memory got reset every hour, I certainly might tell the same story or use the same metaphor over and over.

telotortium · 2026-04-30T06:18:19 1777529899

> However if my memory got reset every hour, I certainly might tell the same story or use the same metaphor over and over.

All people repeat the same stories and phraseology to some extent, and some people are as bad or worse than LLM chat bots in their predictability. I wonder if the latter have weak long-term memory on the scale of months to years, even if they remember things well from decades ago.

yard2010 · 2026-04-30T06:31:25 1777530685

Honestly I think there is more to it - even with infinite context, the LLM needs some kind of intelligence to know what is noise and what is not, you resort to "thinking" - making it create garbage it then feeds to itself.

Learning a language is a big complex task, but it is far from real intelligence.

pants2 · 2026-04-30T04:45:04 1777524304

Nice, OpenAI mentioned my HackerNews post in their article :) I appreciate that they wrote a whole blog post to explain!

https://news.ycombinator.com/item?id=47319285

pants2 · 2026-04-28T20:12:34 1777407154

Lock in is pretty easy these days. Just a dummy example, Claude models are trained on their `str_replace_based_edit_tool` edit tool[1] which is very different from OpenAI's `apply_patch` tool[2].

1. https://platform.claude.com/docs/en/agents-and-tools/tool-us...

2. https://developers.openai.com/api/docs/guides/tools-apply-pa...

pants2 · 2026-04-26T23:12:13 1777245133

Also Discord - tons of people use Discord as a social network and keep up with friends. I must have 5 friend groups that have their own Discords with some overlap.

pants2 · 2026-04-26T18:04:23 1777226663

So did you disclose this responsibly? Posting about it publicly first is asking for that sensitive data to be leaked. Might as well hack and repost that PII yourself.

g48ywsJk6w48 · 2026-04-26T19:04:11 1777230251

This is not a data leakage. They deliberately included 999 of their customers' email addresses in publicly accessible JavaScript code in order to test certain features on them.

pants2 · 2026-04-27T00:18:32 1777249112

Certainly that wasn't intentional to broadcast to the public? Sounds like a textbook data leak.

> A data leak is the unauthorized, often unintentional exposure of sensitive, confidential, or personal information to an external party, usually resulting from weak infrastructure, human error, or system errors.

pants2 · 2026-04-26T18:00:59 1777226459

Consider medical device software. Often embedded C code, needs to be rigorously documented and tested, has longer development cycles, and certainly no attitudes of "bugs are fine, ship it and we'll patch later."

pants2 · 2026-04-26T14:39:04 1777214344

Have you seen https://www.4dv.ai/

RobotToaster · 2026-04-26T18:13:30 1777227210

Doesn't give much information about how they were generated

pants2 · 2026-04-27T00:19:22 1777249162

https://youtu.be/X8yRlA7jqEQ?t=1190

RobotToaster · 2026-04-27T06:29:57 1777271397

The example they gave of 4d splats used a room with dozens of cameras, and it only talked about the software used for 3d, not 4d.

For the outdoors examples on that site I can only assume they used dozens of drones?

pants2 · 2026-04-24T18:55:56 1777056956

Is anyone here actually using pro models through the API? I'd be very curious what the use-case is.

chadash · 2026-04-24T19:07:28 1777057648

Yes. High value work where cost (mostly) doesn't matter. For example, if I need to look over a legal doc for possible mistakes (part of a workflow i have), it doesn't matter (in my case) whether it costs $0.01 or $10.00, since it's a somewhat infrequent event. So i'll pay $9.99 more, even if the model is only slightly better.

bogtog · 2026-04-24T19:42:57 1777059777

I'm surprised I never heard people talking about using -Pro variants, even though their rates ($125-175/M?) aren't drastically larger than old Opus ($75/M), which people seemed to use

freedomben · 2026-04-24T19:11:52 1777057912

Indeed, even just Terms of Service and Privacy Policy work. Infrequent enough that cost isn't an issue, but model quality absolutely is

ComputerGuru · 2026-04-24T19:07:17 1777057637

Yes? The same reason you would use it via the tooling.