Hacker Newsnew | past | comments | ask | show | jobs | submit | lifecodes's commentslogin

hmm you are right, I too wish the same brother

the CoT bug where 8% of training runs could see the model's own scratchpad is the scariest part to me. and of course it had to be in the agentic tasks, exactly where you need to trust what the model is "thinking"

the sandwich email story is wild too. not evil, just extremely literal. that gap between "we gave it permissions" and "we understood what it would do" feels like the whole problem in one anecdote

also the janus point landed, if you build probes to see how the model feels and immediately start deleting the inconvenient ones, you've basically told it honesty isn't safe. that seems like it compounds over time

It's scary to think that some very intelligent AI Model is not honest with us..

Ultron is not far, I guess...


Hey there I read your article, it was good..

But the first time I saw the title, I literally felt like you have copied me..

But ofc, you have your original thing..

Check my this article : https://blog.eshanstudio.com/posts/case-against-humanity/


MANAGED AGENTS sounds like progress, but also like we’re standardizing around the current limitations instead of solving them.


I guess it would be much cheaper to attach an api version of everything we developed till now than teaching these ai to be able to control things in real world as humans do..

I mean if we see the cost of training then making more apis for everything we have makes sense to me.

what do u think?


I think the big thing I was trying to highlight in this article was the fact the not much effort has been put into spatial and image awareness. In my limited experiments where I would manually ask the models to take an image and highlight things (like "circle all elbows") it does a great job... but if you ask the model where an elbow is in the image (in pixels), it does a poor job.

Or maybe put another way, going from `image->model->tool` seems to be an area for improvement.


I guess we are reaching the point where “10T parsmeters” sounds more like a marketing number than a meaningful metric.

Between moE, aggressive quantization, and synthetic data pipelines, it’s getting harder to tell whether bigger models are actually better, or just more expensive to train.

Would be more interesting to see -> capability per dollar or per watt, not parameter count...


If this holds, does it unlock 100B+ models running locally in ~tens of GB RAM? Or does accuracy collapse before that point?


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: