More

zhwu · 2026-03-19T18:00:28 1773943228

The most surprising part: the agent had access to both H100s and H200s. Without being told, it noticed H200s scored better and started screening ideas on H100s, then promoting winners to H200s for validation. That strategy emerged entirely on its own.

rogerrogerr · 2026-03-19T18:13:17 1773943997

Why do we think this emerged “on its own”? Surely this technique has been discussed in research papers that are in the training set.

GorbachevyChase · 2026-03-20T04:48:34 1773982114

You probably express very few truly original ideas. Let’s not set the bar quite so high unless we are all just a sad simulacrum of “pure” thought.

suddenlybananas · 2026-03-20T09:34:41 1773999281

But humans are capable of very many original ideas. Look around you, humans were able to remake the entire world because of these original thoughts.

deadbabe · 2026-03-20T11:38:02 1774006682

Original ideas are easy if you allow for bad ideas.

rullelito · 2026-03-20T11:59:18 1774007958

Then "on its own" has no meaning, i.e. everything an LLM does is "on its own".

fdghrtbrt · 2026-03-19T19:58:48 1773950328

Why surely? Have you never seen an LLM try something new?

rogerrogerr · 2026-03-19T20:17:49 1773951469

Is your assertion that no one has ever written "we tried some stuff on the small inexpensive platform first, then moved to the bigger more expensive platform with the more promising options" in a research paper or literally anywhere else?

fdghrtbrt · 2026-03-19T20:25:50 1773951950

No, that's not my assertion. In fact I asserted nothing at all.

rogerrogerr · 2026-03-19T20:28:14 1773952094

You're speaking in riddles; your communication would be more effective if you didn't do that.

fdghrtbrt · 2026-03-19T20:34:28 1773952468

You said "surely", and I asked:

> Why surely? Have you never seen an LLM try something new?

I'm afraid I can't make it any simpler than this.

And I still don't know the answer to how you're so sure. To me there's several explanations, and it seems to you there's only one.

I'm pretty happy with my communication style.

frank_nitti · 2026-03-19T21:28:49 1773955729

Seems to me the commenter was asking: what observations led us to conclude that original affirmative statement that “the AI did this entirely on its own”.

Given that this is a common technique and not a novel invention, it’s probably present in the training set.

The “surely” reads like it’s referring to the presence of that information in the training set. But your response casts it as saying “surely the AI has not invented something on its own”.

The original question stands IMO, the burden of proof is on whoever is asserting that the AI has invented something on its own, with or without training data that surely already mentions this approach

fdghrtbrt · 2026-03-19T22:52:41 1773960761

There is no burden of proof on me, because I'm not asserting that AI has invented something on its own. I haven't told you what my view is or whether I ever have a view.

The problem with the reasoning of the person I was responding to is that it's assuming "if X is in the training set and LLM outputs X, then it did so because X is in the training set". That does not follow. Conceivably it's possible that X is in the training set and LLM outputs X, but if X hadn't been in the training set the LLM also would've output X.

Lets look at that phrase again:

> Why do we think this emerged “on its own”? Surely this technique has been discussed in research papers that are in the training set.

This phrase implies "if X was in the training set, then LLM couldn't have come up with X on its own". This is false. In fact, my claim that the implication is false is testable, in the following manner: Have two training sets, T and T'. In T, X is present. In T' you've removed X but left X-adjacent things. Train LLM A on T and A' on T'. Find a prompt that requires that A outputs X. If on the same prompt A' also outputs X, that's an example of my claim. To repeat, my claim is "it's possible that X is in the training set and LLM outputs X, but if X hadn't been in the training set the LLM also would've output X."

In fact, I've just realized I even have a method for constructing (T, T') that guarantees what I've described. Not sure if it's worth a paper on its own though.

rogerrogerr · 2026-03-19T22:59:46 1773961186

Your pure logic is probably right; I do not have the time or interest to dissect it.

But you’re missing the context and implication: “doing new stuff” is the major achievement we’re looking for next from LLMs. Seeing something that is “new” and is not in the training set is interesting in a way that something contained in the training set is not.

We cannot introspect LLMs meaningfully yet, so the difference between “came up with myself and it’s in the training set incidentally” and “applied a concept in the training set” is not meaningful.

nl · 2026-03-20T05:22:13 1773984133

I think the number of new math proofs generated by LLMs over the last few months has conclusively proven that yes - they can "come up with things themselves"

A few examples: Axiom's proof of Fel’s open conjecture on syzygies of numerical semigroups: https://x.com/axiommathai/status/2019449659807219884

Erdos 457: https://www.erdosproblems.com/457

The stronger form of Erdos 650: https://www.erdosproblems.com/650

caconym_ · 2026-03-19T20:37:19 1773952639

I honestly don't think I have.

In this case, using a cheap(er) signal or heuristic as an initial filter before spending more resources on cases that pass the filter is a pattern that shows up all over the place, and LLMs are good at picking up on patterns like that and generalizing them. AFAICT.

anon291 · 2026-03-19T22:35:53 1773959753

I'm not sure how people say this so confidently. I have a rather esoteric haskell library that I've written and published for years. ChatGPT and Claude both know about it and frequently help me improve it, and propose completely novel approaches. I'm really not sure how people are so confident that they can't think of anything new. This seems like wishful confirmation bias.

caconym_ · 2026-03-20T02:36:35 1773974195

> I'm not sure how people say this so confidently.

Say what, exactly?

hhh · 2026-03-19T18:33:38 1773945218

Why?… The experiment.yaml shows that it is calling h100/200 explicitly, it’s pretty common for humans to say “number bigger more gooder” for anything… Lie and reverse the values and see what happens. I would put money on a rabbit hole of complaining about it being misconfigured.

ed · 2026-03-19T18:52:43 1773946363

Models are familiar with H100’s. They even predate ChatGPT.

Aboutplants · 2026-03-19T18:04:52 1773943492

Yeah I thought that was a particularly neat part

zhwu · on March 23, 2025

Cloud services, such as autoscaling EKS or AWS Batch are mostly limited by the GPU availability in a single region. That limits the scalability of jobs that can run distributedly in a large scale.

AI batch inference is one of the examples, and this post found that by going beyond a single region, it is possible to speed up the important embedding generation workload by 9x, because of the available GPUs in the "forgotten" regions.

This can significantly increase the iteration speed for building applications, such as RAG, and AI search. We share our experience for launching a large amount of batch inference jobs across the globe with the OSS project SkyPilot.

TL;DR: it speeds up the embedding generation on Amazon review dataset with 30M items by 9x and reduces the cost by 61%.

zhwu · on March 5, 2025

This recent blog actually looks into the case with multiple writers and the distribution for the time for a writer to take the lock: https://blog.skypilot.co/abusing-sqlite-to-handle-concurrenc...

zhwu · on July 11, 2024

Dealing with all the Kubernetes pod configs / deployments is too much for an AI engineer. Being able to focus on the real model work would be super important.

zhwu · on Aug 2, 2023

The finetuning can tailor the model to have more customized knowledge, just like the identity knowledge of itself shown in the blog post. If you ask the original llama model, it should know nothing about SkyPilot or Vicuña, as it is trained on old knowledge from the internet.

However, finetuning still cannot get rid of the hallucination problem that all the chatbot suffers from. It depends on how accurate you expect the chatbot should be. The retrieval might be considered more accurate, as it will not make up solutions, but just return irrelevant answer in the worst case.

zhwu · on Aug 2, 2023

Great reference!

Just want to add about hosting your own LLM vs using ChatGPT. Cost is definitely a thing to consider, but it also depends on whether it is ok to share the requests to your product with OpenAI.

Also, something you cannot do with ChatGPT is to custom it with your own data, such as internal documents, etc. As shown in the blog, the model trained by ourselves can easily know its identity.

zhwu · on Aug 2, 2023

It is the underlying operational guide of the latest release of Vicuna-1.5: https://twitter.com/lmsysorg/status/1686794639469371393

zhwu · on July 19, 2023

This is cool! The Llama 2-70B can be hosted in my own cloud environment.

zhwu · on May 25, 2023

It seems training the Vicuna on custom dataset could be quite easy as well, according to the following: https://github.com/skypilot-org/skypilot/tree/master/llm/vic...

zhwu · on May 25, 2023

Very interesting! Quite surprised to see PaLM-2 ranked even lower than open-sourced Vicuna.