the CoT bug where 8% of training runs could see the model's own scratchpad is the scariest part to me. and of course it had to be in the agentic tasks, exactly where you need to trust what the model is "thinking"
the sandwich email story is wild too. not evil, just extremely literal. that gap between "we gave it permissions" and "we understood what it would do" feels like the whole problem in one anecdote
also the janus point landed, if you build probes to see how the model feels and immediately start deleting the inconvenient ones, you've basically told it honesty isn't safe. that seems like it compounds over time
It's scary to think that some very intelligent AI Model is not honest with us..
I guess it would be much cheaper to attach an api version of everything we developed till now than teaching these ai to be able to control things in real world as humans do..
I mean if we see the cost of training then making more apis for everything we have makes sense to me.
I think the big thing I was trying to highlight in this article was the fact the not much effort has been put into spatial and image awareness. In my limited experiments where I would manually ask the models to take an image and highlight things (like "circle all elbows") it does a great job... but if you ask the model where an elbow is in the image (in pixels), it does a poor job.
Or maybe put another way, going from `image->model->tool` seems to be an area for improvement.
I guess we are reaching the point where “10T parsmeters” sounds more like a marketing number than a meaningful metric.
Between moE, aggressive quantization, and synthetic data pipelines, it’s getting harder to tell whether bigger models are actually better, or just more expensive to train.
Would be more interesting to see ->
capability per dollar or per watt, not parameter count...
reply