Hacker Newsnew | past | comments | ask | show | jobs | submit | more hyperadvanced's commentslogin

You can study the LLM output. In the “before times” I’d just clone a random git repo, use a template, or copy and paste stuff together to get the initial version working.


Studying gibberish doesn't teach you anything. If you were cargo culting shit before AI you weren't learning anything then either.


Necessarily, LLM output that works isn't gibberish.

The code that LLM outputs, has worked well enough to learn from since the initial launch of ChatGPT. This even though back then you might have to repeatedly say "continue" because it would stop in the middle of writing a function.


  Necessarily, LLM output that works isn't gibberish.
Hardly. Poorly conjured up code can still work.


"Gibberish" code is necessary code which doesn't work. Even in the broader use of the term: https://en.wikipedia.org/wiki/Gibberish

Especially in this context, if a mystery box solves a problem for me, I can look at the solution and learn something from that solution, c.f. how paper was inspired by watching wasps at work.

Even the abject failures can be interesting, though I find them more helpful for forcing my writing to be easier to understand.


It's not gibberish. More than that, LLMs frequently write comments (some are fluff but some explain the reasoning quite well), variables are frequently named better than cdx, hgv, ti, stuff like that, plus looking at the reasoning while it's happening provides more clues.

Also, it's actually fun watching LLMs debug. Since they're reasonably similar to devs while investigating, but they have a data bank the size of the internet so they can pull hints that sometimes surprise even experienced devs.

I think hard earned knowledge coming from actual coding is still useful to stay sharp but it might turn out the balance is something like 25% handmade - 75% LLM made.


  they have a data bank the size of the internet so they can
  pull hints that sometimes surprise even experienced devs.
That's a polite way of phrasing "they've stolen a mountain of information and overwhelmed resources that humans would use to other find answers." I just discovered another victim: the Renesas forums. Cloudflare is blocking me from accessing the site completely, the only site I've ever had this happen to. But I'm glad you're able to have your fun.

  it might turn out the balance is something like 25% handmade - 75% LLM made.
Doubtful. As the arms race continues AI DDoS bots will have less and less recent "training" material. Not a day goes by that I don't discover another site employing anti-AI bot software.


> they've stolen a mountain of information

In law, training is not itself theft. Pirating books for any reason including training is still a copyright violation, but the judges ruled specifically that the training on data lawfully obtained was not itself an offence.

Cloudfare has to block so many more bots now precisely because crawling the public, free-to-everyone, internet is legally not theft. (And indeed would struggle to be, given all search engines have for a long time been doing just that).

> As the arms race continues AI DDoS bots will have less and less recent "training" material

My experience as a human is that humans keep re-inventing the wheel, and if they instead re-read the solutions from even just 5 years earlier (or 10, or 15, or 20…) we'd have simpler code and tools that did all we wanted already.

For example, "making a UI" peaked sometime between the late 90s and mid 2010s with WYSIWYG tools like Visual Basic (and the mac equivalent now known as Xojo) and Dreamweaver, and then in the final part of that a few good years where Interface Builder finally wasn't sucking on Xcode. And then everyone on the web went for React and Apple made SwiftUI with a preview mode that kept crashing.

If LLMs had come before reactive UI, we'd have non-reactive alternatives that would probably suck less than all the weird things I keep seeing from reactive UIs.


> Cloudfare has to block so many more bots now precisely because crawling the public, free-to-everyone, internet is legally not theft.

That is simply not true. Freely available on the web doesn't mean it's in the Public Domain. The "lawfully obtained" part of your argument is patently untrue. You can legally obtain something, but that doesn't mean any use of it is automatically legal as well. Otherwise, the recent Spotify dump by Anna's Archive would be legal as well.

It all depends on the license the thing is released under, chosen by the person who made it freely accessible on the web. This license is still very emphatically a legally binding document that restricts what someone can do with it.

For instance, since the advent of LLM crawling, I've added the "No Derivatives" clause to the CC license of anything new I publish to the web. It's still freely accessible, can be shared on, etc., but it explicitly prohibits using it for training ML models. I even add an additional clause to that effect, should the legal interpretation of CC-ND ever change. In short, anyone training an LLM on my content is infringing my rights, period.


> Freely available on the web doesn't mean it's in the Public Domain.

Doesn't need to be.

> The "lawfully obtained" part of your argument is patently untrue. You can legally obtain something, but that doesn't mean any use of it is automatically legal as well.

I didn't say "any" use, I said this specific use. Here's the quote from the judge who decided this:

  5. OVERALL ANALYSIS.
  After the four factors and any others deemed relevant are “explored, [ ] the results [are] weighed together, in light of the purposes of copyright.” Campbell, 510 U.S. at 578. The copies used to train specific LLMs were justified as a fair use. Every factor but the nature of the copyrighted work favors this result. The technology at issue was among the most transformative many of us will see in our lifetimes.
- https://storage.courtlistener.com/recap/gov.uscourts.cand.43...

> Otherwise, the recent Spotify dump by Anna's Archive would be legal as well.

I specifically said copyright infringement was separate. Because, guess what, so did the judge the next paragraph but one from the quote I just gave you.

> For instance, since the advent of LLM crawling, I've added the "No Derivatives" clause to the CC license of anything new I publish to the web. It's still freely accessible, can be shared on, etc., but it explicitly prohibits using it for training ML models. I even add an additional clause to that effect, should the legal interpretation of CC-ND ever change. In short, anyone training an LLM on my content is infringing my rights, period.

It will be interesting to see if that holds up in future court cases. I wouldn't bank on it if I was you.


> That's a polite way of phrasing "they've stolen a mountain of information and overwhelmed resources that humans would use to other find answers."

Yes, but I can't stop them, can you?

> But I'm glad you're able to have your fun.

Unfortunately I have to be practical.

> Doubtful. As the arms race continues AI DDoS bots will have less and less recent "training" material. Not a day goes by that I don't discover another site employing anti-AI bot software.

Almost all these BigCos are using their internal code bases as material for their own LLMs. They're also increasingly instructing their devs to code primarily using LLMs.

The hope that they'll run out of relevant material is slim.

Oh, and at this point it's less about the core/kernel/LLMs than it is about building ol' fashioned procedural tooling aka code around the LLM, so that it can just REPL like a human. Turns out a lot of regular coding and debugging is what a machine would do, READ-EVAL-PRINT.

I have no idea how far they're going to go, but the current iteration of Claude Code can generate average or better code, which is an improvement in many places.


  The hope that they'll run out of relevant material is slim.
If big corps are training their LLMs on their LLM written code…


You're almost there:

> If big corps are training their LLMs on their LLM written code <<and human reviewed code>>…

The last part is important.


Until the humans are required to (or just plain want to) use LLMs to review the code.


It’s crazy how fast the tables turned on SWE being barely required to do anything to SWE being required to do everything. I quite like the 2026 culture of SWE but it’s so much more demanding and competitive than it was 5 or 10 years ago


You could argue (it certainly has been argued) that the ability for technology to dissolve the usually more coherent identities that we take on daily by granting unlimited role play, trolling, and exploration is simply too much for a lot of people, and makes it hard to maintain a coherent sense of self. This is especially true of people who are “internet addicts” - not that the designation means a whole lot as I’m here at the gym talking to you on the phone.

Don’t get me wrong, I mostly agree with your comment. I think even more dastardly is the tendency for the internet to market new personalities to you, based on what’s profitable


There's also the inconvenient truth that a very specific part of the world was online in the 1990s.

Primarily more educated, more liberal, more wealthy.

Turns out, when you hook the rest of the planet online, you get mass persuasion campaigns, fake genocide "reporting", and enough of an increase in ambient noise that coherent anonymous discourse becomes impossible.

I mean, look at the comments on Fox News or political YouTube videos. That's the real average level of discussion.


It's still possible in smaller, constructive communities, not in large general-purpose social networks.


As a hn poster, I agree with this


The 1990s internet was definitely not more liberal! 4chan style forums were probably the rule. I can’t believe someone would say that, clearly you didn’t use the same internet that I did.


He didn't say the internet was more liberal, he said the people on it were.

Before you start forming your reply, think about the actual culture back then. If you take slashdot as somewhat representative of the 90s internet culture, it was basically anti-corporate, meritocratic, non-judgmental, irreligious, educated, non-discriminatory, and once 2000 came around tended to be highly critical of the Bush agenda.

4chan at that time and places like it represented more of an edgelord culture, where showing vulnerability or sensitivity was shunned, everything revered by the larger populace was ruthlessly mocked, and distrust of society and government in general was taken as natural. Calling them conservative would have been non-sensical.


Exactly. If I had to characterize the general internet (read: what would and wouldn't raise an eyebrow in an average forum) in terms of political alignment, it'd probably be:

   - anarchist 60s/70s
   - libertarian-meritocracy 80s/90s
   - capitalist-meritocracy-liberal 00s
   - polarized liberal-globalist vs conservative-reactionary 10s
   - polarized liberal-individualist vs conservative-statist 20s
That SA / 4chan (both of which were really post-90s) existed were in no way proof of an anti-liberal bent. Their very edgelordness was an implicit reveling in absolute freedom of expression (even if their later liberal-pro-censoring and alt-right splinter movements subsequently forgot that).


4chan was very much left-wing to liberal until Stormfront invaded them back. After Caturday came Soviet Sunday.


The response (usually) is “OK but whatabout the $X billion we spent on the military?”

Which isn’t wrong necessarily, but it doesn’t answer why or whether we should be spending so much money on everything else


I actually agree here too. America (and Americans) spend waaaay too much, and especially on niche things that profit very specific subgroups. We need to get back to the basics. Johnny can't read[1], or do math. That should be funded long before we worry about today's PhDs, those kids are the pipeline of future PhDs.

/r

[1]-https://www.forbes.com/sites/ryancraig/2024/11/15/kids-cant-...


The state/local gov tend to be responsible for public education funding. in the US federal gov only does <10% of the funding.

US public education spending is also top 5 in the world so I don't think a lack of money is why "Johnny can't read or do math", something else is going on


Same. I find that if I can piecemeal explain the desired functionality and work as I would pairing with another engineer that it’s totally possible to go from “make me a simple wheel with spokes” to “okay now let’s add a better frame and brakes” with relatively little planning, other than what I’d already do when researching the codebase to implement a new feature


It's quite interesting because it makes me wonder how we make it efficient and predictable. The human language is just too verbose. There must be some DSL, some more refined way to get to the output we need. I don't know whether it means you actually just need to provide examples or something else. But you know code is very binary, do this do that. LLMs are really just too verbose even in this format right now. That higher layer really needs a language. I mean I get it. It's understanding human language and converting it to code. Very clever. But I think we can do better.


Source? Source? Got any source about me? Yeah well those statistics only deal with other people who aren’t me, so I guess you’re not really trusting the science :/


This is just plain wrong, I vehemently disagree. What happens if a payment fails on my API, and today that means I need to go through a 20-step process with this pay provider, my database, etc. to correct that. But what’s worse is if this error happens 11,000 times and I run a script to do my 20 step process 11,000 times, but it turns out the error was raised in error. Additionally, because the error was so explicit about how to fix it, I didn’t talk to anyone. And of course, the suggested fix was out of date because docs lag vs. production software. Now I have 11,000 pissed off customers because I was trying to be helpful.


I agree. The reason Cursor’s “first mover” advantage doesn’t matter is because there’s fundamentally no business there. I’ve used 3 IDE or text editors my whole life, I’ve never paid for one. If I wanted, I could use AI to write myself a new text editor. Like you said, there’s no moat for any of this shit, and I’m guessing that by 2027 the music will stop.


Ironically, AI is really good at the adding tests later thing. It can really help round out test coverage for a piece of code and create some reusable stuff that can inspire you to test even more.

I’m not a super heavy AI user but I’ve vibe coded a few things for the frontend with it. It has helped me understand how you lay out react apps a little better and how the legos that React gives you work. Probably far less than if I had done it from scratch and read a book but sometimes a working prototype is so much more valuable to a product initiative than learning a programming language is that you would be absolutely burning time and value to not vibe code the prototype


Yes, game theory is not a predictive model but an explanatory/general one. Additionally not everything is a game, as in statistics, not everything has a probability curve. They can be applied speculatively to great effect, but they are ultimately abstract models.


You can use it for either predictive or explanatory purposes. In the early ('00s) years of Google it was common to diagram out the incentives of all the market participants; this led to such innovations like the use of the second-price VCG auction [1] for ad sales that now make over a third of a trillion dollars per year.

[1] https://en.wikipedia.org/wiki/Vickrey%E2%80%93Clarke%E2%80%9...


Google has now (mostly?) transitioned to using first-price, and more complicated (opaque) auction-style pricing for many of its advertising products.

https://blog.google/products/admanager/simplifying-programma...


> It’s important to note that our move to a single unified first price auction only impacts display and video inventory sold via Ad Manager. This change will have no impact on auctions for ads on Google Search, AdSense for Search, YouTube, and other Google properties, and advertisers using Google Ads or Display & Video 360 do not need to take any action.


This is a good point, and I’m willing to concede that I may be wrong here, that predictive power can be gained from (possibly mis-) applications of abstractions onto a real space, which is what one of the real reasons to favor abstract thinking in business in the first place.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: