Github has posted that they will now train on everyone's data (even private) unless you opt out (until they change their mind on that). Anthropic has been training on your data on certain tiers already. Meta bittorrented books to train their models.
Surely if your license says "LLM output trained on this code is legally tainted", it is going to dissuade them.
Is this the first time you are reading HN? Every day there are posts from people describing how AI crawlers are hammering their sites, with no end. Filtering user agents doesn't work because they spoof it, filtering IPs doesn't work because they use residential IPs. Robots.txt is a summer child's dream.
>Claude Code users typically treat the .claude folder like a black box. They know it exists. They’ve seen it appear in their project root. But they’ve never opened it, let alone understood what every file inside it does.
I know we are living in a post-engineering world now, but you can't tell me that people don't look at PRs anymore, or their own diffs, at least until/if they decide to .gitignore .claude.
I don't. I have Claude do all my PR reviews, running in a daily loop in the morning. The truth is an LLM is better at code review than the average programmer.
I'm a senior engineer who has been shipping code since before GitHub and PR reviews was a thing. Thankfully LLMs have freed me from being asked to read other people's shit code for hours every day.
Not that I'm entirely onboard with it, but often you don't have a channel to communicate with "the people who can change the machine", only the cogs in the machine.
Websites routinely access the same urls over and over in a single page session, especially with aggressive ad refresh. Normally you only incur the first request as load, not the subsequent ones.
I agree. If I see "unfortunately we receive hundreds of applications from people who don't read the job description, please include the word banana in your application" I will be sympathetic. If I "see interview with our ai bot first" I will nope out.
Surely if your license says "LLM output trained on this code is legally tainted", it is going to dissuade them.
reply