Hacker Newsnew | past | comments | ask | show | jobs | submit | tossandthrow's commentslogin

Likely, but through regulation, not AI

This will likely be alleviated when Ai first projects take over as important OSS projects.

Fir these projects everything "tribal" has to be explicitly codified.

On a more general note: this is likely going to have a rather big impact on software in general - the "engineer to company can not afford to loose" is likely loosing their moat entirely.


I have switched entirely away from anything deno, even though I used it in supabase.

But I need to have everything in a mono repo for agents to properly work on in.

Cloud functions and weak desperation between dev and prod is a mess, even more so with agents in the loop.


> But I need to have everything in a mono repo for agents to properly work on in.

Why was this a problem with Deno? Up until recently you had to use package.json and npm/pnpm for it to work, but even then it was better than Bun or Node since you could use import map to avoid compiling packages for testing etc (Node and Bun's type stripping doesn't work across local monorepo dependencies, and tsx produces mangled source maps making debugging a hassle). Now Deno has built-in workspace/monorepo support in deno.json.


> But I need to have everything in a mono repo for agents to properly work on in.

Why is that? Seems like an agent framework limitation, not a reasonable requirement in general. (I do not have this limitation, but I also have a custom agent stack)


I've found myself occasionally wishing I had a monorepo purely for Claude Code for web (Anthropic's hosted version of Claude Code), since it can currently only work with one private repository at a time.

On my own machine I have a dev/ folder full of checkouts of other repos, and I'll often run Claude Code or Codex CLI in that top level folder and tell it to make changes to multiple projects at once. That works just fine.


The "dev/" folder concept is what I give my agent, so I can select what I want it to have access to. On my computer, I have a few of those to group those that go together.

Couldn’t you make a pseudo monorepo via git submodules?

I don't think there's a way to have that work in Claude Code for web, since each checkout there uses a custom GitHub access token scoped to a single repository.

GitHub tokens can span more than one repo or org if the account requesting has access to them. Is this supported on the non-web version?

(I was going to try claude again this weekend, but when I tried to login, got an error and am reminded how much down time I experience from Anthropic, arg...)


The non web version uses whatever credentials you have setup yourself, so it works just fine.

Is that host level or can I provide scoped tokens based on what I'm doing?

In other words, do Anthropic tools provide any affordances for this or is it something I have to manage externally?


I'm just talking about the version of Claude Code which runs in containers on their infrastructure here - they call it "Claude Code on the web" (terrible name) and you access it through their native apps or from https://claude.ai/code

That product only works against one GitHub repo at a time. The Claude Code you install and run locally doesn't have a GitHub attachment at all and can run against whatever you checkout yourself.

Here's an open feature request about it: https://github.com/anthropics/claude-code/issues/23627


Submodules are pain, use the dependency management systems for the languages in your monorepo.

This site (from nx), while biased, explains it best. https://monorepo.tools/

In a poly repo setup, agents are less effective having to infer changes across repo boundaries using specs rather than code as context. Changes that impact multiple repos are also much messier to wrangle.


Monorepos come with a lot of pain too. Two sides of the same coin. I manage the build system for a large monorepo. Questions that will get you to a primary source of pain...

How do you minimally build based on the changeset? How do you know this is sufficient for correctness? What happens when feature branches get out of date and don't see the upstream change that breaks the local branch? How do you version subprojects, as they change or as a whole?

Monorepos have a habit of creating hidden dependencies. The languages you use can help or hurt here.


I think media outlets think way too highly of their contribution to AI.

Had they never existed, it had likely not made a dent to the AI development - completely like believing that had they been twice as productive, it had likely neither made a dent to the quality of LLMs.


How do you think those models get trained? You can only get so far with Wikipedia, Reddit, and non-fiction works like books and academic papers.

Have a look at this article: https://www.washingtonpost.com/technology/interactive/2023/a...

NY Times is 0.06% of common crawl.

These news media outlets provide a drop in the ocean worth of information. Both qualitatively and quantitatively.

The news / media industry is really just trying to hold on to their lifeboat before inevitably becoming entirely irrelevant.

(I do find this sad, but it is like the reality - I can already now get considerably better journalism using LLMs than actual journalists - both click bait stuff and high quality stuff)


That seems like a reductive way to consider it. What percent of music was created by Led Zeppelin? What percent of art was painted by Monet? What percent of films by Alfred Hitchcock? It may be a small percentage objectively but they are hugely influential.

I don't think back propagation care whose text it is back propagating.

The data sets aren't naively fed into the training runs.

Instead, training attempts to sample more heavily from higher quality sources, with, I'm sure, a mix of manual and heuristic labeling.


fwiw, no llm ive ever used generated in the writing style newspapers and -sites use - hence i honestly doubt they've been given a meaningful boost in relevancy.

their idioms would leak occasionally otherwise


90% of common crawl is complete junk. While the tiny bit of news articles powers almost all the ai answers in Google search.

News takes a very different path to get into search results. It's not going through databases or archive passes, that would take far too long.

And don't basically all those news sites allow google on purpose?


How many Reddit, HN, etc. posts are based on NYT articles? How many derivative news articles, blog posts, YouTube videos, TikToks, etc. are responses to those articles?

At least NYT is probably on the correct side of Sturgeon’s Law: https://en.wikipedia.org/wiki/Sturgeon%27s_law


> How many Reddit, HN, etc. posts are based on NYT articles? How many derivative news articles, blog posts, YouTube videos, TikToks, etc. are responses to those articles?

You may get an inconvenient answer when you ask the question the other way around.


0.06% is way higher than I would expect

How does the entire textual corpus of say, new York times compare to all novels? Each article is a page of text, maybe two at most? There certainly are an awful lot of articles. But it's hard to imagine it is much more than a couple hundred novels. There must be thousands of novels released each year

Like apples to oranges.

LLMs are (apparently) massively used to get information about topics in the real world. Novels aren't going to be much help there. Journalism, particularly in written form, provides a fount of facts presented from different angles, as well as opinions, and it was all there free for the taking…

Wikipedia provides the scantest summary of that, fora and social media give you banter, fake news, summaries of news, and a whole lot of shaky opinions, at best. Novels give you the foundations of language, but in terms of knowledge nothing much beyond what the novel is about.


LLMs can get up to date information from primary sources - no journalists required.

I don't understand how LLMs can ask questions at a press conference.

To begin with, your premise is that the only primary sources are press conferences and that press conferences only provide information in response to questions.

But even taking it literally, isn't that one of the things LLMs could actually do? You're essentially asking how a text generator could generate text. The real question is whether the questions would be any good, but the answer isn't necessarily no.


I'm sure any competent agent would send an email, or just ask as an aside in a chat.

Startup idea right there.

I don't think an LLM can have secret human sources that provide them with confidential information anonymously. Not all news shows up on Twitter.

You don't need the secret human sources any more.

You used to need them, because journalists had the distribution and the sources didn't. In a word of printed newspapers, you couldn't get your story distributed nationally (much less worldwide) without the help of a journalist, doubly so if you wanted to stay anonymous.

Nowadays, you just make a Substack and there's that.

See that recent expose on the Delve fraud as just one example. No journalists were harmed in the making of that article.


This is technically trivial. Most data comes froms chats these days, not the web.

Start thinking!


The primary source for most news is journalism.

In context, primary source means the subject of the article (the thing the journalist is writing about).

Journalism is by definition a secondary source. (Notwithstanding edge cases like articles reporting directly on the news industry itself.)


Journalism is absolutely not by definiton a secondary source.

If a journalist is on location covering a flood, for example, they are the primary source.

A journalist conducting an interview would also be a primary source.


Primary sources can and often are, very biased. Journalists are (supposed to be) doing fact checks and gathering multiple sources from all sides. Modern journalism is in a terrible state, but still important.

Imagine if all info about Facebook came from Facebook...


By talking to users? I'm 100% sure Google and OpenAI know every major news story, in much greater detail, long before NYT publishes it.

I'd imagine they already have a database of users such that if multiple people talk about a possibly true subject they can ask subject experts; users related to the subject for clarification and further information.


If excluding these sources wouldn’t make a difference, why do AI companies scrape them despite explicit requests to not be scraped?

They want an as diverse data set as possible?

It is not like they paid anybody else for their contribution either.

It is just not more worth than anything else in the data sets.


Define quality.

Many publications put information on the internet for the first time, or curate it for the first time, or research a topic deeper than ever before. Someone - a thinking, feeling human - had to get out there and try restaurants, talk to people, pore through archives, read books, use products. Each of them contribute a little to what we know about the world.

I do this for a living. AI might soon put me out of work. It already more than halved my audience, using my own work. It's sickening to see people cheer for it because they have a bone to pick with certain websites. Eventually those websites will be gone, but so will the good ones that produced critical information.


Isn't the non-LLM generated text becoming more valuable for training as the web at large is flooded with slop?

Preventing new human generated text from being used by AI firms (without consent) seems like a valid strategy.


No.

Modern LLMs are trained on a large percentage of synthetic data.

This sentiment is largely legacy (even though just a couple of years old).


They likely don't need tenant isolation and unbound table reads can be mitigated using timeouts.

We do something similar for our backoffice - just with the difference that it is Claude that has full freedom to write queries.


Can Claude drop tables?

Love that you needed to make it clear that it is humans that can explain themselves..

Employees can only be held accountable with severe malice.

There is a good chance that the person actually responsible (eg. The ceo or someone delegated to be responsible) will soon prefer to have AIs do the work as their quality can be quantified.


What theory is that?

My experience is the absolute opposite. I am much more in control of quality with Ai agents.

I am never letting junior to midlevels into my team again.

In fact, I am not sure I will allow any form of manual programming in a year or so.


> I am never letting junior to midlevels into my team again

Exactly. You control the quality of the people in your team. You can train, fire, hire, etc until you get the skill level you want.

You have effectively no control over the quality of the output from an LLM. You get what the frontier labs give you and must work with that.


That is not correct.

It is much easier to control quality of an Ai than of inexperienced developers.


I think we are talking past each other.

> I am never letting junior to midlevels into my team again

My point is, you control the experience level of the engineers on your team. The fact that you can say you won't let junior or midlevels on your team proves that.

You do not have that level of control with LLMs. Anthropic and OpenAI are roughly the same quality at any given time. The rest are not useful.


Ah, so that is not entirely correct.

I can control LLMs through skills and other gateways.

There are still tasks that LLMs does not really carry out that well, where a proper senior is needed.

Butnthese tasks are quickly disappearing, especially while the code base is slowly being optimized for agentic engineering.


Eh. You want a good mix of experience levels, what really matters is everyone should be talented. Less experienced colleagues are unburdened by yesterday’s lessons that may no longer be relevant today, they don’t have the same blind spots.

Also, our profession is doomed if we won’t give less experienced colleagues a chance to shine.


Our profession is likely doomed not because we don't train people, but by the lack of demand

> I am never letting junior to midlevels into my team again

From a different one of your posts

So you're the one dooming the profession. Nice work, thank you!


No, I genuinely don't belive there is the future demand for that many developers.

And the developers we need do not jump through the career progression of Junior to senior.

Why the f** would I keep investing in a profession I think is dead or seriously contracting?


Do you not find that depressing and sad? Do you never work with enthusiastic and talented junior developers at the start of their careers? Do you not enjoy interacting with them?

Well...

I think it would be more depressing taking in exited junior developers, spending years of their life not believing that they are growing into any real career.

> ... the start of their careers

It is exactly this assumption I am challenging.

What comes next, I don't know - and I am not trying to kid myself or any others that I am well suited as a mentor for person starting out their career in the current environment.


Such proposal doesn't need justification. You can merely disagree.

Anyhow. The justification is that it is an important part of a communications infrastructure.

Just like the government finances roads, etc.


I'm not disagreeing with you, but shouldn't free Internet access come before that?

We should be making sure everyone has internet access, but hosting some basic pages is about 1000x cheaper, so no I don't think free internet access should come before that.

Internet access doesn't seem to be an issue.

Politics is also about making practical choices to advance humanity.


Well, he explains all deferred spend.

Deferred spending is quite unnatural. That I can work 1 hour today and buy youghurt in 2 years is an artifact of our system.

But this also relies on someone making that youghurt in 2 years from now.

It is that key dogma that will likely be under pressure for future pensioners.


It is well known that smartphones can be difficult to use with dry skin - like most elderly have

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: