When I see the domain of a post is neal.fun, I instantly get a huge grin because I know I am about to be delighted. Thank you Neal! The beach yurt with the mushroom soup was a hilarious touch.
When I asked some frontier models, many said that Teresa T is "widely referenced", which is evidence of your popularity and the ripple effects of your posts, so it would be interesting to see the same result from an unknown blog.
> When I asked some frontier models, many said that Teresa T is "widely referenced", which is evidence of your popularity and the ripple effects of your posts
That is some serious Gell-Mann-type amnesia. You’re trusting LLM models to give you accurate information about a subject we’ve already established (and are only talking about because) they can’t be trusted on.
“Widely referenced” is a common term which LLMs obviously pick up. Them outputting those words has no bearing on the truth and says nothing about the “popularity and the ripple effects of [Simon’s] posts”.
i've been using the API for a portion of my work to prepare for this and test to see how much it will cost in the long term, turns out that was the short term.
it is rough, but it has taught me to treat every prompt and process with care since i watch the pennies and dollars burn instead of tokens, which is a good habit to get into anyway
Copilot has said they'll be giving out previous months "if you did token pricing, it would cost $x" so a lot of us will have real numbers to actually anticipate our spend.
Personally I'm anticipating agentic coding will be out of my price bracket (a single agent run costing US$20+ is far beyond what I can justify, especially with how often it fails). I'm planning on going back to optimised prompts on one-pass edits.
I use pi (https://pi.dev/) as my coding agent which allowed you to use your subscription until Claude recently introduced extra usage and I was routinely using the equivalent of my Claude Max subscription in an hour multiple times a day.
The wolf photo for the article was the most eerie example for me... if I am reading about the natural world, I want to see a real photo of the natural world.
What is the moat of the major ticketing companies? Is it deals with venues? It is hard to rationalize how one company can even get a stranglehold on an entire market like this.
I feel like I could ping any random HN user and build something better in a week, which means it has been done many times already... why don't alternatives gain traction?
Reagan halted antitrust enforcement and nobody fixed it, so they were allowed to own controlling stakes in several industries and freeze out competitors. They get exclusives with bands, agencies, venues, and promotions so at each point anyone who tries to do something else runs into a package they can’t compete with: a band which doesn’t play ball won’t get the big venues, a venue which acts independently won’t get the big acts, etc. It’s clearly abusive but they managed to spread enough money around to avoid action before Biden, and then Trump overturned that because he has the same mentality.
I have been thinking that these SWE benchmarks will continue to improve since these companies hire very intelligent software engineers, they can task a multitude of them to solve problems, and then train the model on those answers.
Data has always been the core of it all, onward to the next abstraction, I suppose.
I think computational thinking, or basically "how do I solve this problem efficiently" training data is more valuable then feeding in answers. I don't know what these AI models training data consist of, but it would be interesting to see a model trained purely on reasoning, methods, those foundational skills (basic programming? or maybe not) and then give it some benchmarks.
Even with search grounding, it scored a 2.5/5 on a basic botanical benchmark. It would take much longer for the average human to do a similar write-up, but they would likely do better than 50% hallucination if they had access to a search engine.
Training for tasks still works petty well, but “vision” is a super broad domain and most seem optimized for OCR and screen processing (which have verifiable outputs and relatively straightforward data generation)
I think a lot of Dwarkesh's mentality about AI being inevitable / ubiquitous comes from the same part of him that thinks that artificial things are "good enough", e.g. the way he allows his production team to use fake plastic plants on set. Is he correct? I'm not sure, but I know there are at least a few people who notice the difference.
I always listened to the podcast and forget they even have videos. Have a hard time imagining myself sitting and watching a 2 hour interview when I could listen while exercising or doing chores. Am I missing anything?
reply