More

gobdovan · 2026-04-12T08:00:03 1775980803

Usually when there's "on a [low] $/mo" you'll hear basic advice. You'd be surprised to find out many folks are not aware of this!

senko · 2026-04-12T09:25:37 1775985937

Well, there's also the "How we saved $10M/mo by actually paying attention to indexes" trope.

gobdovan · 2026-04-12T07:58:24 1775980704

Nice list! I'd say the SQLite with WAL is the biggest money saver mentioned.

One note: you can absolutely use Python or Node just as well as Go. There's Hetzner that offers 4GB RAM, 10TB network (then 1$/TB egress), 2CPUs machines for 5$.

Two disclaimers for VPS:

If you're using a dedicated server instead of a cloud server, just don't forget to backup DB to a Storage box often (3$ /mo for 1TB, use rsync). It's a good practice either way, but cloud instances seem more reliable to hardware faults. Also avoid their object store.

You are responsible for security. I saw good devs skipping basic SSH hardening and get infected by bots in <1hr. My go-to move when I spin up servers is a two-stage Terraform setup: first, I set up SSH with only my IP allowed, set up Tailscale and then shutdown the public SSH IP entrypoint completely.

Take care and have fun!

jon-wood · 2026-04-12T12:19:54 1775996394

Personally for backups I’d avoid using a product provided by the same company as the VM I’m backing up. You should be defending against the individual VM suffering corruption of some kind, needing to roll back to a previous version because of an error you made, and finally your VM provider taking a dislike to you (rationally or otherwise) and shutting down your account.

If you’re backing up to a third party losing your account isn’t a disaster, bring up a VM somewhere else, restore from backups, redirect DNS and you’re up and running again. If the backups are on a disk you can’t access anymore then a minor issue has just escalated to an existential threat to your company.

Personally I use Backblaze B2 for my offsite backups because they’re ridiculously cheap, but other options exist and Restic will write to all of them near identically.

joelthelion · 2026-04-12T12:45:18 1775997918

> You are responsible for security. I saw good devs skipping basic SSH hardening and get infected by bots in <1hr. My go-to move when I spin up servers is a two-stage Terraform setup: first, I set up SSH with only my IP allowed, set up Tailscale and then shutdown the public SSH IP entrypoint completely.

Note that you don't need all of that to keep your SSH server secure. Just having a good password (ideally on a non-root account) is more than enough.

chillfox · 2026-04-12T14:40:44 1776004844

Disable password auth and go with key based, it's easier and more secure.

gobdovan · 2026-04-12T13:01:11 1775998871

I'd call it unnecessary exposure. Under both modern threat models and classic cybernetic models (check out law of requisite variety) removing as much surface attack area as possible is optimal. Especially disabling passwords in SSH is infosec 1o1 these days. No need to worry about brute force attacks, credential stuffing, or simple human error, which was the cause of all attacks I've seen directly.

It's easier to add a small config to Terraform to make your config at least key-based.

t_mahmood · 2026-04-12T08:32:37 1775982757

About security, wall of shame story,

Once I had Postgresql db with default password on a new vps, and forgetting to disable password based login, on a server with no domain. And it got hacked in a day, and was being used as bot server. And that was 10 years ago.

Recently deployed server, and was getting ssh login attempts within an hour, and it didn't had a domain. Fortunately, I've learned my lesson, and turned of password based login as soon as the server was up and running.

And similar attempts bogged down my desktop to halt.

Having an machine open to the world is now very scary. Thanks God for service like tailscale exists.

dwedge · 2026-04-12T10:23:48 1775989428

Nothing would happen, ssh is designed to be open to the world. Using tailscale or a vpn to hide your IP is fine, but using tailscale ssh maybe not.

t_mahmood · 2026-04-12T16:19:33 1776010773

Well continuous attempts definitely bogged down my desktop pretty bad. Also, getting OOM on a 64gb machine multiple times a day is quiet annoying.

And one simple mistake, and we're screwed

ericpauley · 2026-04-12T18:14:16 1776017656

If sshd is OOMing on 64GB something else is going on…

dwedge · 2026-04-12T10:17:56 1775989076

I need more info about devs getting infected over ssh in less than an hour. Unless they had a comically weak root password or left VNC I don't believe it at all

gobdovan · 2026-04-12T13:04:21 1775999061

Yes, <1h was a weak root password. All attacks I've seen directly were always user error. The point is effectively removing attack surfaces rather than enhancing security in needlessly exposed internet-facing protocols.

dwedge · 2026-04-12T14:27:51 1776004071

It must have been comically weak, like "root", "password" or something like that

selcuka · 2026-04-12T08:11:09 1775981469

> Nice list! I'd say the SQLite with WAL is the biggest money saver mentioned.

Funny you said that. I migrated an old, Django web site to a slightly more modern architecture (docker compose with uvicorn instead of bare metal uWSGI) the other day, and while doing that I noticed that it doesn't need PostgreSQL at all. The old server had it already installed, so it was the lazy choice.

I just dumped all data and loaded it into an SQLite database with WAL and it's much easier to maintain and back up now.

gobdovan · 2026-04-12T08:29:32 1775982572

Yep, it literally is a one-file backup. And runtime it's so much faster for apps where write serialisation is acceptable.

bilinguliar · 2026-04-12T17:35:58 1776015358

Sqlite + Litestream for backups.

InfraScaler · 2026-04-12T10:05:51 1775988351

Does WAL really offer multiple concurrent writers? I know little about DBs and I've done a couple of Google searches and people say it allows concurrent reads while a write is happening, but no concurrent writers?

Not everybody says so... So, can anyone explain what's the right way to think about WAL?

gobdovan · 2026-04-12T10:37:49 1775990269

No, it does not allow concurrent writes (with some exceptions if you get into it [0]). You should generally use it only if write serialisation is acceptable. Reads and writes are concurrent except for the commit stage of writes, which SQLite tries to keep short but is workload- and storage-dependent.

Now this is more controversial take and you should always benchmark on your own traffic projections, but:

consider that if you don't have a ton of indexes, the raw throughput of SQLite is so good that on many access patterns you'd already have to shard a Postgres instance anyway to surpass where SQLite single-write limitation would become the bottleneck.

[0] https://www.sqlite.org/src/doc/begin-concurrent/doc/begin_co...

pixelesque · 2026-04-12T10:12:02 1775988722

No it doesn't - it allows a single writer and concurrent READs at the same time.

InfraScaler · 2026-04-12T10:20:08 1775989208

Thanks! even I run a sqlite in "production" (is it production if you have no visitors?) and WAL mode is enabled, but I had to work around concurrent writes, so I was really confused. I may have misunderstood the comments.

yomismoaqui · 2026-04-12T10:45:45 1775990745

Writes are super fast in SQLite even if they are not concurrent.

If you were seeing errors due to concurrent writes you must adjust BUSY_TIMEOUT

InfraScaler · 2026-04-12T14:36:00 1776004560

Thanks I'll have a look. For now I just had a sane retry strategy. Not that I have any traffic, mind you :-)))

egwor · 2026-04-12T10:50:38 1775991038

First step is to get ssh setup correctly, and second step is to enable a firewall to block incoming connections on everything except the key ports (ssh but on a different port/web/ssl). This immediately eliminates a swathe of issues!

bornfreddy · 2026-04-12T12:20:57 1775996457

Also use fail2ban. If nothing else to decrease the amount of junk in logs.

asymmetric · 2026-04-12T09:25:38 1775985938

> Also avoid their object store.

Curious as to why you say this. I’m using litestream to backup to Hetzner object storage, and it’s been working well so far.

I guess itt’s probably more expensive than just a storage box?

Not sure but I also don’t have to set up cron jobs and the like.

gobdovan · 2026-04-12T10:25:15 1775989515

Historical reliability and compatibility. They claimed they were S3 compatible, but they were requiring deprecated S3 SDKs, plus S3 advanced features are unimplemented (but at least they document it [0]). There was constant timeouts for object creation and updates, very slow speeds and overall instability. Even now, if you check out r/hetzner on reddit, you'll see it's a reliability nightmare (but take it with a grain of salt, nobody reports lack of problems). Not as relevant for DB backups, but billing is dumb, even if you upload a 1KB file, they charge you for 64KB.

At least with Storage Box you know it's just a dumb storage box. And you can SSH, SFTP, Samba and rsync to it reliably.

[0] https://docs.hetzner.com/storage/object-storage/supported-ac...

nurgalive · 2026-04-12T11:23:05 1775992985

When creating a VPS on Hetzner, it lets you by default to configure the key auth only.

jimnotgym · 2026-04-12T14:17:34 1776003454

From memory this is the case on DO as well

gobdovan · 2026-04-11T11:26:55 1775906815

I have a few tricks for handling procrastination that are in this ballpark:

1. When I see myself wanting to procrastinate, I ask myself 'If I follow this feeling, will it increase my power (i.e. capacity/agency/utility) or decrease it?'. Then I have a dialogue with myself: Nope, let's refocus, maybe try reading things out loud or draw a diagram or some other perspective change OR Yeah, I should stop for now, do something else, as long as that increases my power.

2. I observed that usually procrastination really is tied to novelty, quite similar with how it's presented in the article so I did this thing: instead of going on YouTube or games I started typing exercises online. After some time, I realised that I could get better at typing and get some extra-novelty by typing an existing book! So I have a Tampermonkey script that, whenever I try to go on a random typing website, redirects me to a website where I can type books (I could push it as a gist if anyone's interested). It stores in Local Storage what page I reached and from where I left them of. I got to read On the Origin of Species this way and now I type around 100 WPM from 80 WPM.

riscoe · 2026-04-12T11:23:46 1775993026

How long for you to increase your WPM from 80 to 100 by typing books?

sebmellen · 2026-04-11T18:01:17 1775930477

Could you share the Tampermoneky script in the gist? Would love that.

gobdovan · 2026-04-12T08:21:26 1775982086

shared also in peer comment: https://gist.github.com/ouatu-ro/5ca4abca26bd65630de3d4768fe...

vscode-rest · 2026-04-11T13:23:19 1775913799

But how much have you learned about the Origin of Species?

gobdovan · 2026-04-11T14:07:49 1775916469

Quite a lot. And I'd say that I process more info by typing than by simply reading. I typed the first edition and got the printed second edition too after.

I always searched videos of what he was exemplifying and found quite amazing material for many (enslaver ants, ants tickling aphids, honeycomb construction). Was super impressed to hear about Darwin's peers which he calls out by name every time, how there were people specialised in breading races, judging what constitutes species.

Was kind of stunned to find out that people didn't know dogs were all the same species, how hard it was even for specialised breeders to identify that their pigeons were changing, since they were not really taking pictures.

How Darwin published a book that was approachable to common folks, how the book was built on mountains of hand-collected data.

There's so, so, so much more I could talk about (tree of life, organs, descendant resemblance happening at the same age, embryology weirdness) but biggest mind-fuck would be the anti-teleological stance he holds. Basically, out of nowhere (although I saw that he read Hume [0]), Darwin figures out that things don't happen 'for a reason'. Things don't live because they're 'better'. All the creatures we see today are simply the things that survived. There's no final goal, no 'ought to be' in the world. We're simply patterns that survive that resemble patterns that happened to survive.

[0] https://plato.stanford.edu/entries/hume/

pxc · 2026-04-11T16:26:39 1775924799

Yes, please share it!

gobdovan · 2026-04-11T18:56:59 1775933819

https://gist.github.com/ouatu-ro/5ca4abca26bd65630de3d4768fe...

pxc · 2026-04-11T19:44:01 1775936641

I also sometimes like to use typing test pages to kinda warm myself up before I start a project. I want to do really well and race someone else or type faster than I usually can, then ride the high of victory to get something done or break procrastination. But this is a much better way to do things; this way I can make that activity help advance my other goals.

gobdovan · 2026-04-11T07:02:23 1775890943

The author calls it a 'joke' that Heroes are just unpaid Amazon employees, but reality doesn't become a joke just because it's funny. The asymmetry here is staggering. I find myself holding back private research because I don't want to provide free R&D for a value-extraction machine that is already efficient enough.

The author was at least dependency-driven in their contribution, but outside that kind of dependency, it's hard to justify contributing even 'in the open' when the relationship is this one-sided. Amazon in particular has done enormous damage to the economic assumptions that permissive open source once relied on. There's increasingly more projects adopting 'Business Source Licenses', precisely to prevent open work from becoming a free input into hyperscaler monetization.

These devs know Amazon is grabby and, at some point, the only dominant outcome their community contribution is upstream of is unpaid labor for a trillion-dollar entity that also diverts support and community engagement away from the original projects by funneling users into managed versions of the same software.

djoldman · 2026-04-11T12:43:56 1775911436

If someone doesn't like Amazon using software they write, they can just outright disallow Amazon from using it in the copywrite license.

It's perfectly legal to say: "except for Amazon [and whoever], anyone can use this for any purpose, provided..."

Amazon won't intentionally use that software. It's not worth the potential legal liability.

That doesn't mean Amazon won't write their own version though if they think they need to at some point.

gobdovan · 2026-04-11T13:14:21 1775913261

I am saying this is exactly what's happening, but with more robust language. If you disallow Amazon, maybe there is a third party that offers our services to Amazon. So Amazon-the-string is not the bogeyman; the concern is the resale or hosted-service arrangement they can access.

So you see formulations that target infrastructure resale rather than specific entities, such as:

"For the avoidance of doubt, the following scenarios are not permitted under the license:

* A managed service that lets third party developers ... register their own [SERVICE] service endpoints and invoke them through that managed service."

"You may not provide the software to third parties as a hosted or managed service, where the service provides users with access to any substantial set of the features or functionality of the software."

"If you make the functionality of the Program or a modified version available to third parties as a service, you must make the Service Source Code available via network download to everyone at no charge, under the terms of this License [...] where 'Service Source Code' is defined broadly to include the entire hosting stack (monitoring, backups, etc.) to ensure a level playing field"

djoldman · 2026-04-11T15:33:21 1775921601

> I find myself holding back private research because I don't want to provide free R&D for a value-extraction machine that is already efficient enough.

If someone wants to release technology in a way that makes it publicly viewable but restricts its use, they can do that.

If they don't want to release it, they don't have to.

Additionally, publicly released technology destroys patentability, if that's the objective.

I don't understand what one would want to achieve that can't be achieved here.

Cpoll · 2026-04-11T14:23:31 1775917411

> If you disallow Amazon, maybe there is a third party that offers our services to Amazon. So Amazon-the-string is not the bogeyman; the concern is the resale or hosted-service arrangement they can access

That's some acrobatics I suspect Amazon won't engage in, because communicating to the customer that your FooBarDB is managed in AWS but hosted by a third party is awkward.

Amazon will happily reimplement your API with their backend, as they've done before.

drzaiusx11 · 2026-04-11T20:23:43 1775939023

AFAICT, large saas players can simply implement the software interfaces regardless of business source licenses like what happened to redis, no? Or is there some specific protections for API surfaces that I'm not aware of. I vaguely recall Google v Oracle almost established some protections but then got deferred in later ruling. My memory is hazy on that though...

ryandvm · 2026-04-12T00:14:55 1775952895

Indeed. And with the frontier AI models it's worse than that. You can literally just have them write test cases for the product you want to clone, then set it loose reverse engineering the code base.

That said, all these models are trained on the open source code bases presumably, so it would be interesting to see if AI-blackbox reverse engineering actually holds up in court.

drzaiusx11 · 2026-04-12T03:00:54 1775962854

My gut says it would infact hold up in current US courts, but only because the lionshare of corporations want it to and the courts have been stacked in their favor.

I personally believe it should not and that AI code should NOT be considered a "clean room" method. That said, IANAL.

surgical_fire · 2026-04-11T08:50:30 1775897430

> There's increasingly more projects adopting 'Business Source Licenses', precisely to prevent open work from becoming a free input into hyperscaler monetization.

They could use AGPL or GPL3, typically those licenses are verboten in hyperscalers.

The truth is that the sort of company opting for BSL never really wanted to do OSS, and in truth only did so for the optics of it, for the goodwill it buys among developers, etc.

noosphr · 2026-04-11T10:58:50 1775905130

The GPL3 can be put behind a server and no one will ever see the source code because there is never any distribution.

Only the AGPL is remotely close to forcing hyper-scalars to release the source code of what they provide.

graemep · 2026-04-11T09:45:38 1775900738

I know this is true of AGPL, but GPL3? I thought the people who objected to GPL3 were those distributing software to their users (e.g. was a reason Apple switched from bash to zsh). I cannot think of aything in GPL3 that would be a problem for hyper-scalers.

wasmainiac · 2026-04-11T10:01:12 1775901672

> They could use AGPL or GPL3, typically those licenses are verboten in hyperscalers.

Laws are only as good as their enforcement, in business at least. Unfortunately I have seen first hand that no one cares about licensing if they can’t get caught.

Businesses licenses are good because you can offer support and other benefits to encourage payment.

cxr · 2026-04-11T12:43:16 1775911396

> Laws are only as good as their enforcement

The claim is that those licenses are deemed no-touch within those companies—it's the companies themselves that insist on the software and their business not mixing, e.g. Apple continuing to ship old versions of GNU programs like Bash and then eventually moving to zsh rather than provide updated versions that are GPLv3.

Neither GPLv3 nor AGPLv3 say anything about businesses not being able to use the software.

surgical_fire · 2026-04-11T11:46:01 1775907961

Hey, nothing wrong with closed source, BSL, etc. I am fine with it. I am the last person that will say someone should give out their work for free.

What I object to is companies releasing software with permissive licenses, and then getting butthurt that others profit from it, or trying to rug pull the permissive licenses after a community adopted and contributed to it.

If you want to play the OSS game, then play it right.

direwolf20 · 2026-04-11T09:19:56 1775899196

Or SSPL, which extends AGPL with even more sharing requirements.

aleph_minus_one · 2026-04-11T11:00:49 1775905249

The SSPL is not an open-source license.

cxr · 2026-04-11T12:48:10 1775911690

> It's deception, plain and simple, to claim that the software has all the benefits and promises of open source when it does not.

From "The SSPL is Not an Open Source License" <https://opensource.org/blog/the-sspl-is-not-an-open-source-l...>

direwolf20 · 2026-04-12T10:13:23 1775988803

Yes it is. It plainly meets all the criteria in the definition iff AGPL meets them too.

aleph_minus_one · 2026-04-12T16:37:05 1776011825

The central difference is that SSPL discriminates against specific fields of endeavor, while the AGPL doesn't; see

* https://web.archive.org/web/20230411163802/https://lists.ope...

* https://opensource.org/blog/the-sspl-is-not-an-open-source-l...

* https://en.wikipedia.org/w/index.php?title=Server_Side_Publi...

queenkjuul · 2026-04-11T08:06:35 1775894795

I'm "lucky" to not be smart enough or important enough to think about this. Regardless, i wholeheartedly agree -- at this point, anything i personally could release publicly, will either be fully open source, or completely private. And I'm only choosing open source if I'm relatively sure it's not gonna make some asshole tons of money.

gobdovan · 2026-04-11T13:38:04 1775914684

That's in the ballpark how big corps use open source strategically. They try to kill everyone value extraction moat at any other layer than the ones they dominate. So they commoditize their complement [0]. They don't care if you make money based on their OSS, as long as you race to the bottom against anyone else who also has access to it and turn anything but the corp's profit center into a ubiquitous commodity. So they make the "asshole"'s incentives line up with their own.

[0] https://www.joelonsoftware.com/2002/06/12/strategy-letter-v/

growrow · 2026-04-11T14:31:30 1775917890

That link was a great read and makes a strong point! Another reason corps invest in OSS is to develop something they rely on - special driver, etc - and capitalizing on that in the form of OSS maintainers charging consulting fees has been successful. Exactly in agreement with making the incentives line up with their own.

gobdovan · 2026-04-10T14:07:27 1775830047

You could wrap pyobject via a proxy that controls context and have AI have a go at it. You can customise that interface however you want, have a stable interface that does things like

proxy.describe() proxy.list_attrs() proxy.get_attr("columns")

This way you get a general interface for AI interacting with your data, while still keeping a very fluid interface.

Built a custom kernel for notebooks with PDB and a similar interface, the trick is to also have access to the same API yourself (preferably with some extra views for humans), so you see the same mediated state the AI sees.

By 'wrap' I mean build a capability-based, effect-aware, versioned-object system on top of objects (execs and namespaces too) instead of giving models direct access. Not sure if your specific runtime constraints make this easier or harder. Does this sound like something you'd be moving towards?

manzt · 2026-04-10T14:51:51 1775832711

Really interesting idea! Part of the ethos here is that models are already really good at writing Python, and we want to bet on that rather than mediate around it. Python has the nice property of failing loudly (e.g., unknown keywords, type errors, missing attributes) so models can autocorrect quickly. And marimo's reactivity adds another layer of guardrails on top when it comes to managing context/state.

Anecdotally working on pair, I've found it really hard to anticipate what a model might find useful to accomplish a task, and being too prescriptive can break them out of loops where they'd otherwise self-correct. We ran into this with our original MCP approach, which framed access to marimo state as discrete tools (list_cells, read_cell, etc.). But there was a long tail of more tools we kept needing, and behind the scenes they were all just Python functions exposing marimo's state. That was the insight: just let the model write Python directly.

So generally my hesitation with a proxy layer is that it risks boxing the agent in. A mediated interface that helps today might become a constraint tomorrow as models get more capable.

gobdovan · 2026-04-10T15:30:17 1775835017

Yeah, I'm talking more about a wrapper over the python data model (pyobject) rather than an MCP-style API for kernel interaction. I'm not proposing you abstract interactions under a rigid proxy, but that you can use proxy objects to virtualise access to the runtime. You could still let the model believe it is calling normal python code, but in actuality, it goes via your control plane. Seeing the demo I'd imagine you already have parts of this nailed down tho.

manzt · 2026-04-10T15:52:51 1775836371

Ah, I think I misread your earlier comment. That's a more interesting version of the idea than what I responded to. We don't do this today, but marimo's reactivity already gives us some control plane benefits without virtualizing object access. That said, I can imagine there are many more things a proxy layer could do. Need to think on it, thanks for the clarification :)

mscolnick · 2026-04-10T14:15:19 1775830519

How do you teach the model to use this new API? Wouldn't they be more effective just using the polars/pandas API which is has been well trained with?

gobdovan · 2026-04-10T14:33:12 1775831592

Codex just picks it up. The surface is basically a guarded object model, so pandas/polars-style operations stay close to the APIs the model already knows. There's some extra-tricks but they're probably out of scope for an HN comment.

In practice, Pandas/Polars API would lower to: proxy -> attr("iloc") -> getitem(slice(1,10,None))

gobdovan · 2026-04-09T05:27:18 1775712438

Language is a tool, you have to do what's best for your own goal.

If you read Orwell, his message is not necessarily that complex language is worse at transmitting ideas, as he's actually arguing that complex language can hide the speaker's real motivation and deceive more easily.

For Paul Graham, I'd say for him the 'write like you talk' is very good advice since he's interacting with founders whose first language is not English, people with different backgrounds from his, young folks that maybe didn't take an academic route, so for him it checks out to recommend it.

Leslie Laport always talks about how you should always write down what you think. Until you write something down, you only think you're thinking. Also, he's all about writing most things in math over English, since math is less ambiguous (and less complex). And I'd say math is quite different from how you talk.

Now, you can notice how you can have different motivation for the same behaviour (Orwell and Graham), or different behaviour for a similar motivation (Orwell and Lamport). Maybe more interestingly, think about people with the opposite intentions from the ones above: a contractor that wants to mimic sophistication to get a contract with a bank (with representatives also mimicking sophistication); guilds trying to preserve a high barrier of entry. The advice they'd appreciate would be the opposite since their goals differ.

gobdovan · 2026-04-08T19:34:42 1775676882

It's at least Meta-relevant. Compression Represents Intelligence Linearly (Y Huang, 2024)

gobdovan · 2026-04-08T14:42:58 1775659378

The 99.9% is less impressive than you'd think, currently they're not even keeping the same program behaviour 0.1% of the time. They also mention AST in the pipeline, not CST, so I wouldn't expect source-preservation to be a direct goal.

Also, if you use a nonstandard spacing, I'd say that's on you to preserve a mechanical Source-AST mapping if you want to use any tool that mentions dataflow analysis & transforms.

As a side note, comments are much trickier than non-standard spacing if their positioning is semantic.

gobdovan · 2026-04-08T14:20:51 1775658051

No. JSIR is primarily for JS -> IR -> JS for analysis and source-to-source transformation. It's not a ready-made bridge for emitting other languages

You could use it as an intermediate form in a JS->C# pipeline, but you still have to define a subset of JavaScript that lowers cleanly to your target C# runtime and implementing the IR->C# lowering yourself.

I'd imagine the hard part is not the IR, but aligning the JavaScript semantics (object model, closures, prototypes etc.) with C# (static type system, different execution model..).

catapart · 2026-04-08T15:46:17 1775663177

Right on. That makes sense. Thanks for spelling it out!

I do think aligning the semantics will be the easier part, honestly, because I'm only trying to transpile the supported source for the game engine. Since that's all written in typescript and I'm not guaranteeing full parity if you are trying to transpile arbitrary ts/js (only the source that can be parsed the same way the game engine is parsed), I'm expecting it to be a near 1-to-1 conversion. I started writing everything in C# and copied the structure to JS, knowing that this was the eventual plan, so the JS can actually be re-written as C# with a pretty simple regex tokenizer.

My hope, here, is that by having the code morphed into an IR, that the IR would be some kind of well-known IR that - for instance - C# could also be morphed into and - therefore - would allow automatic parsing back and forth. From what you're saying, though, it sounds like IRs don't use a common structure for describing code (I'm guessing because of the semantic misalignment you mention between a wide variety of different paradigms?), so this would only work if I made the map from IR to C# which would be just as complex (or more so) than just regexing my JS into C#. If I've got that right, that's a bummer, but understandable. If I'm wrong, though, happy to learn more!

gobdovan · 2026-04-08T16:18:31 1775665111

I don't see anything wrong that would disqualify your plan. But if the alternative is regex, and you're writing already in TypeScript, you may take a look at ts-morph [0]. TS has very good compiler APIs and that gets you something much safer than text-based replacement while still staying relatively small for a constrained subset. ts-morph wraps those APIs cleanly.

Btw, JS doesn't even have an official bytecode. The spec is defined at the language semantics level, so each engine/toolchain invents its own internal representation.

[0] https://github.com/dsherret/ts-morph

catapart · 2026-04-08T20:48:49 1775681329

oh, hell yeah! I hadn't heard of ts-morph. Seems like it would make transpiling to C# much simpler. I'll definitely give it a try. Thanks!

gobdovan · 2026-04-08T10:45:31 1775645131

I think the WASM world is a clear example that bridges the gap you're describing.

You usually compile from SSA to WASM bytecode, and then immediately JIT (Cranelift) by reconstructing an SSA-like graph IR. If you look at the flow, it's basically:

Graph IR -> WASM (stack-based bytecode) -> Graph IR

So the stack-based IR is used as a kind of IR serialization layer. Then I realized that this works well because a stack-based IR is just a linearized encoding of a dataflow graph. The data dependencies are implicit in the stack discipline, but they can be recovered mechanically. Once you see that, the blindness mostly disappears, since the difference between SSA/graph IRs and expression/stack-based IRs is about how the dataflow (mostly around def-use chains) is represented rather than about what optimizations are possible.

Fom there it becomes fairly obvious that graph IR techniques can be applied to expression-based structures as well, since the underlying information is the same, just represented differently.

Didn't look close enough to JSIR, but from looking around (and from building a restricted Source <-> Graph IR on JS for some code transforms), it basically shows you have at least a homomorphic mapping between expression-oriented JS and graph IR, if not even a proper isomorphism (at least in a structured and side-effect-constrained subsets).

sjrd · 2026-04-08T11:55:45 1775649345

Only compilers that already had an SSA-based pipeline transform SSA to stack-based for Wasm. And several don't like that they have to comply with Wasm structured control flow (which, granted, is independent from SSA). Compilers that have been using an expression-based IR directly compile to Wasm without using an SSA intermediary.

gobdovan · 2026-04-08T14:05:32 1775657132

I was imprecise, I was specifically thinking of already SSA-based tech.

My broader point is that for SSA-based pipelines targeting Wasm, translation between SSA/graph IR and stack-based IR is largely mechanical and efficient. Whether a compiler uses SSA as an intermediary or goes straight from an AST to Wasm, the fact remains that you can round-trip between a SSA-like IR and a stack-based IR without losing the underlying dataflow information.

Yeah, mapping is not canonical and some non-semantic structure is not preserved (evaluation order, materialization points, join encoding, CFG reshaping for structured control and probably some more structure I'm not familiar with), but optimization power is unaffected.

And JSIR seems to be based on an even stronger assumption.

Would appreciate corrections if you see things differently.

sjrd · 2026-04-08T15:46:26 1775663186

Right. I believe we are in agreement on that.