More

skybrian · 2026-05-04T02:18:56 1777861136

Why do they put everything into one huge bucket? Wouldn't the best way to clean it up be to create more buckets?

sagiba · 2026-05-04T08:03:37 1777881817

You can have lots of buckets, but each one typically still contains many datasets.

Think of a team doing ML, for example. They work with data all day across many different tools, each reading some inputs from S3 and writing outputs to S3. They won't create a bucket for every output, that's not practical. So they write to a single bucket with outputs organized under prefixes.

Buckets are more of an administrative boundary (IAM, cost, replication) than a data organization unit. So even with more buckets, the dataset abstraction is still missing - there's no good native way to track what a prefix represents, who created it, whether it's still accessed, how much it costs, etc.

skybrian · 2026-05-03T18:28:40 1777832920

Latency matters if you want programmers to be able to work interactively. We multitask because LLMs are too slow.

That's happened before, with people submitting batch jobs in the mainframe era, people working on big projects that take an hour to compile, or people waiting for code reviews.

However, even if the LLMs become fast, the coding agent will likely bottleneck on running tools. You will need to keep your tests fast, too.

skybrian · 2026-05-03T04:23:31 1777782211

I agree, and add:

You don't need to put up a marketing page that tries to convince people to use your software. Instead (or as well), consider explaining all the reasons why someone should not use your software. More users, more problems.

skybrian · 2026-05-02T22:54:14 1777762454

They didn't make a clear argument in favor of that architecture and I'm not really convinced.

On exe.dev the agent (Shelley) runs in a Linux VM, which is the security boundary. All the conversations are saved to a sqlite database, and it knows how to read it, so you can refer to a previous conversation in the database. It's also handy for asking the AI to do random sysadmin stuff, since it can use sudo.

A downside is that there's nowhere in the VM where secrets are safe from possibly getting exfiltrated via an injection attack. But they have "integrations" where you can put secrets into an http proxy server instead of having them locally.

Also, you don't need to use AI at all. You can use the VM as a VM.

ramraj07 · 2026-05-03T10:09:08 1777802948

No matter how smart you think you get, I personally dont trust the models in an environment where they can read the secrets one way or another, in any high volume production environment.

skybrian · 2026-05-02T19:58:04 1777751884

Ideally Roblox would be able to rely on the platform to tell them whether the device is child-locked or not. It would be up to parents to make sure their kids only have access to devices with appropriate locks turned on. Parents could rely on vendors to make devices where it’s easy to set appropriate locks, and rely on stores not to sell unlocked devices to kids.

But we don’t live in that world.

Also, the are trying to prevent adults from pretending to be kids, which is much harder than preventing kids from accessing adult sites.

watwut · 2026-05-04T07:52:46 1777881166

> Ideally Roblox would be able to rely on the platform to tell them whether the device is child-locked or not. It would be up to parents to make sure their kids only have access to devices with appropriate locks turned on

Yes, for people who live from selling tech, it is ideal when parents have to buy a separate device for everyone. But for people who do not live from selling tech, they prefer one or two tablets for the whole family, you know it is cheaper.

And being cheap is one of reasons for roblox popularity. Kids could have play it without buying anything (except that tablet).

Aurornis · 2026-05-02T23:15:42 1777763742

This is an interesting comment because there’s a parallel effort to shift age verification to devices, which draws a lot of hate here.

subscribed · 2026-05-03T00:49:46 1777769386

Because almost universally it's not an privacy-preserving age verification, but permanently deanonyming identification.

Please, let's keep it accurate.

ytoawwhra92 · 2026-05-04T00:21:13 1777854073

> permanently deanonyming identification

As is showing your ID to a bartender.

watwut · 2026-05-04T07:53:27 1777881207

That is temporary deanonyming identification. Bartender is pretty much guaranteed to forget in 10 minutes.

skybrian · 2026-05-02T18:55:00 1777748100

It seems pretty reasonable to me that when you're not driving, the car is basically a taxi and the taxi service is to blame for any mistakes. The car manufacturer isn't just making cars anymore. It's providing a service.

Perhaps they could sell the car to a different taxi service, though?

skybrian · 2026-05-02T18:37:40 1777747060

Those profits, if they happen, will be delayed, so it means they aren't worth quite as much.

skybrian · 2026-05-02T18:34:27 1777746867

Just about anything can be claimed to maximize shareholder profits in the long term. This is an illustrative example of how it's done.

Whether it actually turns out that way is another question.

skybrian · 2026-05-02T18:27:16 1777746436

Although most of the real-world data is probably boring, collecting more of it likely makes discovering rare edge cases more likely. But since they happen rarely, I imagine that after discovering them, they would then need to figure out how to simulate them.

skybrian · 2026-05-01T00:24:48 1777595088

Databases often use table statistics to try to do better at generating query plans. I wonder if they use them to make indexes faster as well?

10000truths · 2026-05-01T01:18:22 1777598302

The cost plan is a crude approximation of the actual query cost. Sometimes, the query planner makes a terrible guess. Your resident DBA won't appreciate being sometimes paged at 3 AM on a Sunday. A good strategy is to freeze the query plan once you have sufficient sample size of data in the involved tables.

crazygringo · 2026-05-02T14:42:24 1777732944

Unfortunately, "freezing the query plan" isn't something available in many popular databases. It's not supported by e.g. Postgres, MySQL, or SQLite.

10000truths · 2026-05-04T01:51:50 1777859510

Perhaps not every aspect of the query plan can be dictated, but both MySQL and Postgres (with pg_hint_plan) allow you to specify hints that enforce specify join order and scan behavior for the tables in your query, which is where the majority of "unexpected change in query plan" problems will arise. As for SQLite, I'm less familiar with the knobs available for query tuning, but a cursory Google tells me that join order is respected when using CROSS JOIN, and index usage can be forced with INDEXED BY/NOT INDEXED.