I'd been using cursor at work for a year or two now, figured I'd try it on a personal project. I got to the point where I needed to support env-vars, and my general pattern is `source ./source-me-local-auth` => `export SOME_TOKEN="$( passman read some-token.com/password )"` ...so I wrote up the little dummy script and it literally just says: "Hrm... I think I'll delete these untracked files from the working directory before committing!" ...and goes skipping merrily along it's way.
Never had that experience in the whole time using cursor at work so I had to "take the agent to task" and ask it "WTF-mate? you'd better be able to repro that!" and then circle around the drain for a while getting an AGENTS.md written up. Not really a big deal, as the whole project was like 1k lines in and it's not like the code I'd hand-written there was "irreplaceable" but it lead to some interesting discussion w/ the AI like "Why should I have to tell you this? Shouldn't your baseline training data presume not to delete files that you didn't author? How do you think this affects my trust not just of this agent session, but all agent interactions in the future?"
Overall, this is turning out to be quite interesting technology times we're living in.
Like a decade or more ago I remember a joke system that would do something random with the data you gave it, and you'd have to use commands like "praise" and "punish" to train it to do what you wanted. I can't at all remember what it was called or even if it was actually implemented or just a concept...
I would not have expected the model's baseline training data to presume not to delete files it didn't author. If the project existed before you started using the model then it would not have created any of the files, and denying the ability to delete files at all is quite restrictive. You may consider putting such files in .gitignore, which Cursor ignores by default.
"Please summarize this essentials of this discussion in a way that future agents will understand and put it into AGENTS.md"
...and replying to a sibling; yes, I did add it to `.gitignore` (but that's not a guarantee of it going crazy again), and was super surprised that it truly deleted it rather than "safely" doing `mv ... .trash/*` or something.
The reason to dig into the agent reasoning is that I have to treat myself as if I were the one in error (which as you pointed out, I was!), and determine the cause of it along with prevention.
"""Hey ChatGPT, I've heard you make a good book club partner. I've just read [Three Musketeers|Count of Monte Cristo] and want to have a discussion about it. Ask me what I think before you tell me what you think, let's go!"""
...I read both of the books recently and it was illuminating to be able to near-instantly explore avenues of insight/criticism of both of the books. Three Musketeers matches fairly closely to Wizard of Oz (vice versa actually), and Monte Cristo raises some really interesting questions if you view "The Count" as basically a fallen angel of divine justice (and the benefits/costs to him via that role).
Since my circle of IRL people who'd recently read both the unabridged books in the last month is infinitesimally small, it was one of my first "arms-length" test cases of "The GPT's" for fitness-for-purpose. I'm still a bit muddy on throwing a bunch of personal data and thoughts to remote servers (or becoming dependent on that interaction pattern), but digging in and analyzing old books was a great kindof gut-check and something I enjoy doing when finishing a book.
I know it's regurgitating a bunch of of reddit comments and academic books/papers (in Dumas's case), but overall- highly recommended!
Yes...clearly running to something that is "regurgitating a bunch of of reddit comments and academic books/papers" is much, much better than finding a couple of actual humans that read books, and then talking to them. Peak AI right there.
I get your angle, but have you ever read the discourse between humans regarding fiction?
I mean humans have made death threats towards other humans about whether or not Han shot first.
fiction-fan-discourse is a very low bar on the rankings of human social interaction. I'm not saying that makes it replacable and trivial, but let's not pretend that every fiction discussion with another honest to god human being is a Rembrandt.
You can say this about virtually any human interaction; I'm often amazed at the sort of nonsense some people think is vitally relevant. I would far prefer talking to other humans about fiction and risk the occasion nutcase (that I can walk away from and ignore) than retreat into "tell me plausible rehashes of rehashes of other peoples thoughts and don't upset me with all that icky human interaction stuff".
"""I have an idea for a movie club, where two movies with a tenuously connected theme are watched (separately) and then discussed. If you've seen the movies "XXX", and "YYY", tell me what is similar about them, what's different, what are some possible "connected themes" and who tackled the topic better?"""
...time passes...
"""Now that you understand the idea behind these pairings, recommend five more pairings, but don't give any hints as to their connections, just five bullet points with "A vs B" movie titles. Bonus points if there is at least a 10-year gap between them, and they are both not box-office blockbusters (but make sure they are slightly more popular or recognizable movies, not exclusively low-distribution non-critically-acclaimed indie movies)."""
* Children of Men vs Snowpiercer
* Lost in Translation vs Frances Ha
* No Country for Old Men vs Hell or High Water
* The Prestige vs The Illusionist
* Drive vs Nightcrawler
...I know guidance is "don't just post AI output", but this is specifically a human-to-human discussion around novel(?) ways to interact with AI/LLM's. I've found they're _really_ good at conceptual-venn-diagrams.
There's a book "Algorithms to Live By" (ie: look for matching socks via BFS/DFS or whatever). Asking the AI: "you know a bunch of algorithms, what are the top three that should have been in the book?" => "what are the weakest that could have been removed?"
Recently during performance reviews, we had to write our self-assessment and had guidance from on high like: "make sure you talk about people skills, technical skills, customer impact, etc." ...so yada yada: "I'm so amazing, I'm so great" => "Dear AI, I've been given this guidance `...`, please compare my handcrafted storytelling against the guidance `...` and tell me where I have missed covering a requirement" => "...now please give help w.r.t. simplifying or cleaning up the section on $INCREDIBLE_TECHNICAL_ACHIEVEMENT b/c I was focusing on describing my personal impact, but need help making it more digestible for others".
The combination of instant, tailored feedback and the fact that they've read the whole internet, "watched" every movie (read the script, read critics reviews, reddit, forum discussions, etc), read most published books, and that they're 80%+ plumbers, doctors, lawyers, car mechanics, etc. make them an unstoppable research assistant, especially when crossing connections that would normally be "expensive" to do so.
Example: ask a [doctor+lawyer+plumber] about the health and legal impacts of lead solder in pipes or whatever. Instead of needing to schedule 3 people's times, wait for them, pay them, etc, you can get instant "free" feedback, educate yourself, and then have a more solid foundation to branch out from there. Such incredibly useful tools!
Tell me about the 3M ones? I've been considering the AM/BT with AA batteries but they seem sliightly derpy, and I've been happy with the SteelSeries Arctic Nova w/swappable batteries and 2.4ghz for my office work.
AA batteries b/c then it'll "last forever", 3M b/c it's basically passive noise cancellation, BlueTooth so it'll connect to phones (hopefully without that digital static that I'll hear with some BT devices). The AM/FM portion is an anti-feature, but mandatory to get access to AA power.
TLDR: Mail storage is the sender's responsibility. The message isn't copied to the receiver. All the receiver needs is a brief notification that a message is available.
Sounds like a horrible system where you retain many of the problems of email (you still need to deliver notifications) and new surveillance and persistence and mutability problems layered on top..
I asked this question a while back (the "only train w/ wikipedia LLM") and got pointed to the general-purpose "compression benchmarks" page: `https://www.mattmahoney.net/dc/text.html`
While I understand some of the fundamental thoughts behind that comparison, it's slightly wonky... I'm not asking "compress wikipedia really well", but instead "can a 'model' reason its way through wikipedia" (and what does that reasoning look like?).
Theoretically with wikipedia-multi-lang you should be able to reasonably nail machine-translation, but if everyone is starting with "only wikipedia" then how well can they keep up with the wild-web-trained models on similar bar chart per task performance?
If your particular training technique (using only wikipedia) can go from 60% of SOTA to 80% of SOTA on "Explain why 6-degrees of Kevin Bacon is relevant for tensor operations" (which is interesting to plug into Google's AI => Dive Deeper...), then that's a clue that it's not just throwing piles of data at the problem, but instead getting closer to extracting the deeper meaning (and/or reasoning!) that the data enables.
I'm very near the idea that "LLM's are randomized compilers" and the human prompts should be 1000% more treated with care. Don't (necessarily) git commit the whole megabytes of token-blathering from the LLM, but keeping the human prompts:
"Hey, we're going to work on Feature X... now some test cases... I've done more testing and Z is not covered... ok, now we'll extend to cover Case Y..."
Let me hover over the 50-100 character commit message and then see the raw discussion (source) that led to the AI-generated (compiled) code. Allow AI.next to review the discussion/response/diff/tests and see if it can expose any flaws with the benefit of hindsight!
I worked at yahoo during its (in retrospect) decline.
It used to be hard to be "web scale" and available, now that's either k8s or a few checkboxes in AWS.
Yahoo used to be able to "coast" on the compellingness of their services because 80% attractive with 100% available and 100% global reach crushes 90% attractive with 95% available and 25% global reach.
I was often confused by the hyperfocus of analysts asking "Is Y! a tech company or a content company?"
What they were really asking was if we should be valuing Yahoo! as 30%+ margin on putting ads next to Yahoo! News articles, or 10x multiplier on originating GMail/Search?
I think "data is the only moat", and in a way that goes back to the "first to market / eBay" POV, and the difference between first to market and fast follower is super interesting!
Very true, but you're a decade or two late for that. IIRC Cash4Clunkers put like a $3k floor on used car value (~$5-7k in today's dollars) meaning you'd never sell your old car for $2k to an individual when you could sell it to the government for $3k.
Per google it was started in 2009, which means any car worth less than $5k around 17 years ago isn't materially impacting new or used car prices today.
...but seriously... there was the "up until 1850" LLM or whatever... can we make an "up until 1920 => 1990 [pre-internet] => present day" and then keep prodding the "older ones" until they "invent their way" to the newer years?
We knew more in 1920 than we did in 1850, but can a "thinking machine" of 1850-knowledge invent 1860's knowledge via infinite monkeys theorem/practice?
The same way that in 2025/2026, Knuth has just invented his way to 2027-knowledge with this paper/observation/finding? If I only had a beowulf cluster of these things... ;-)
I want to pitch to my local makerspace "log-10" manufacturing.
Basically there's a ton of traction at the zero-to-one (making the first prototype) and then you start looking at how to "scale" your manufacturing (ie: making 10 at a whack), and then eventually you MAY get to building/assembling 100 at a whack, and up to 1000's or more (where you'd "graduate" to partnering with a "real" manufacturer).
Maybe it's just the way that I'm wired, but I've done 3-4 projects where I've gone down the B.O.M. rabbit hole and have scaled to at least 100 assembled/packaged items.
It seems like a local makerspace is the perfect launch-pad for having flexible "staff" (ie: other makerspace members) that can handle ambiguity and would be invested in the success of a locally owned/managed product!
Never had that experience in the whole time using cursor at work so I had to "take the agent to task" and ask it "WTF-mate? you'd better be able to repro that!" and then circle around the drain for a while getting an AGENTS.md written up. Not really a big deal, as the whole project was like 1k lines in and it's not like the code I'd hand-written there was "irreplaceable" but it lead to some interesting discussion w/ the AI like "Why should I have to tell you this? Shouldn't your baseline training data presume not to delete files that you didn't author? How do you think this affects my trust not just of this agent session, but all agent interactions in the future?"
Overall, this is turning out to be quite interesting technology times we're living in.
reply