Hacker Newsnew | past | comments | ask | show | jobs | submit | adrian17's commentslogin

So the repo builds:

- C library

- neovim plugin

- MCP server

But not a plain binary, which is the main way ripgrep is directly used (...at least by humans), and compared with.


because it is meant to be used by the long running sdk not one shot search (this is where all the optimizations are coming from)

> Epic did it backwards — they built the game first, then tried to force the infrastructure (EGS) into existence with money.

Didn't Valve push Steam through HL2? It's a different kind of forcing of course, but still.


That just makes people install it it doesn't make people buy anything. I have installed epic store but I never ever even thought about buying a game there, I installed it since they give me free games so I'd assume most gamers have installed epic store. But getting people to install isn't enough.

It already has a fast path, from (I think) 3.11. If you run `object.x` repeatedly on the same type of object enough times, the interpreter will swap out the LOAD_ATTR opcode to `LOAD_ATTR_INSTANCE_VALUE` or `LOAD_ATTR_SLOT`, which only makes sure that the type is the same as before and loads the value from a specified offset, without doing a full lookup.

That's not true. I mean: it's true that it has little to do with OOP, but most imperative languages (only exception I know is Rust) have the issue, it's not "Python specific". For example (https://godbolt.org/z/aobz9q7Y9):

struct S { const int x; int f() const; }; int S::f() const { int a = x; printf("hello\n"); int b = x; return a-b; }

The compiler can't reuse 'x' unless it's able to prove that it definitely couldn't have changed during the `printf()` call - and it's unable to prove it. The member is loaded twice. C++ compilers can usually only prove it for trivial code with completely inlined functions that doesn't mutate any external state, or mutates in a definitely-not-aliasing way (strict aliasing). (and the `const` don't do any difference here at all)

In Python the difference is that it can basically never prove it at all.


I'm been occasionally glancing at PR/issue tracker to keep up to date with things happening with the JIT, but I've never seen where the high level discussions were happening; the issues and PRs always jumped right to the gritty details. Is there anywhere a high-level introduction/example of how trace projection vs recording work and differ? Googling for the terms often returns CPython issue tracker as the first result, and repo's jit.md is relatively barebones and rarely updated :(

Similarly, I don't entirely understand refcount elimination; I've seen the codegen difference, but since the codegen happens at build time, does this mean each opcode is possibly split into two (or more?) stencils, with and without removed increfs/decrefs? With so many opcodes and their specialized variants, how many stencils are there now?


Occasionally Core.py will do some updates, higher level stuff:

https://open.spotify.com/show/1PGRfdrLEwgXjQbPBNk1pW

pablo and Łukasz


> I've never seen where the high level discussions were happening

Thanks for your interest. This is something we could improve on. We were supposed to document the JIT better in 3.15, but right now we're crunching for the 3.15 release. I'll try to get to updating the docs soon if there's enough interest. PEP 744 does not document the new frontend.

I wrote a somewhat high-level overview here in a previous blog post https://fidget-spinner.github.io/posts/faster-jit-plan.html#...

> does this mean each opcode is possibly split into two (or more?) stencils, with and without removed increfs/decrefs?

This is a great question, the answer is not exactly! The key is to expose the refcount ops in the intermediate representation (IR) as one single op. For example, BINARY_OP becomes BINARY_OP, POP_TOP (DECREF), POP_TOP (DECREF). That way, instead of optimizing for n operations, we just need to expose refcounting of n operations and optimize only 1 op (POP_TOP). Thus, we just need to refactor the IR to expose refcounting (which was the work I divided up among the community).

If you have any more questions, I'm happy to answer them either in public or email.


I saw your documentation PR, thank you!

I also did some reading and experiments, so quickly talking about things I've found out re: refcount elimination:

Previously given an expression `c = a + b`, the compiler generated a sequence of two LOADs (that increment the inputs' refcounts), then BINARY_OP that adds the inputs and decrements the refcounts afterwards (possibly deallocating the inputs).

But if the optimizer can prove that the inputs definitely will have existing references after the addition finishes (like when `a` and `b` are local variables, or if they are immortals like `a+5`), then the entire incref/decref pair could be ignored. So in the new version, the DECREFs part of the BINARY_OP was split into separate uops, which are then possibly transformed into POP_TOP_NOP by the optimizer.

And I'm assuming that although normally splitting an op this much would usually cost some performance (as the compiler can't optimize them as well anymore), in this case it's usually worth it as the optimization almost always succeeds, and even if it doesn't, the uops are still generated in several variants for various TOS cache (which is basically registers) states so they still often codegen into just 1-2 opcodes on x86.

One thing I don't entirely understand, but that's super specific from my experiment, not sure if it's a bug or special case: I looked at tier2 traces for `for i in lst: (-i) + (-i)`, where `i` is an object of custom int-like class with overloaded methods (to control which optimizations happen). When its __neg__ returns a number, then I see a nice sequence of

_POP_TOP_INT_r32, _r21, _r10.

But when __neg__ returns a new instance of the int-like class, then it emits

_SPILL_OR_RELOAD_r31, _POP_TOP_r10, _SPILL_OR_RELOAD_r01, _POP_TOP_r10, etc.

Is there some specific reason why the "basic" pop is not specialized for TOS cache? Is it because it's the same opcode as in tier1, and it's just not worth it as it's optimized into specialized uops most of the time; or is it that it can't be optimized the same way because of the decref possibly calling user code?


Update: I put up a PR to document the trace recording interpreter https://github.com/python/cpython/pull/146110


You’ll probably want to look to the PEPs. Havent dug into this topic myself but looks related https://peps.python.org/pep-0744/


I think CPython already had tier2 and some tracing infrastructure when the copy-and-patch JIT backend was added; it's the "JIT frontend" that's more obscure to me.


discussions might be happening on the Python forums, which are pretty active.

https://discuss.python.org/t/pep-744-jit-compilation/50756/8... here's one thing

I do think you can also just outright ask questions about it on the forums and you'll get some answers.

At the end of the day there's only so many people working on this though.


have you read the dev mailing list? There the developers of python discuss lots.


There isn’t a dev mailing list any more, is there? Do you mean the Discord forum?


UPDATE: I misunderstood the question :-/ You can ignore this.

I love playing with compilers for fun, so maybe I can shed some light. I’ll explain it in a simplified way for everyone’s benefit (going to ignore the stack):

When an object is passed between functions in Python, it doesn’t get copied. Instead, a reference to the object’s memory address is sent. This reference acts as a pointer to the object’s data. Think of it like a sticky note with the object’s memory address written on it. Now, imagine throwing away one sticky note every time a function that used a reference returns.

When an object has zero references, it can be freed from memory and reused. Ensuring the number of references, or the “reference count” is always accurate is therefore a big deal. It is often the source of memory leaks, but I wouldn’t attribute it to a speed up (only if it replaces GC, then yes).


what at all does this comment have to do with what it's replying to?


I misread the original comment, thinking it was a question about what is refcount elimination, than how it affects the JIT's performance(?).


The drop"-in" compatibility claims are also just wrong? I ran it on the old test suite from 6.0 (which is completely absent now), and quickly checking:

- the outputs, even if correctly deduced, are often incompatible: "utf-16be" turns into "utf-16-be", "UTF-16" turns into "utf-16-le" etc. FWIW, the old version appears to have been a bit of a mess (having had "UTF-16", "utf-16be" and "utf-16le" among its outputs) but I still wouldn't call the new version _compatible_,

- similarly, all `ascii` turn into `Windows-1252`

- sometimes it really does appear more accurate,

- but sometimes it appears to flip between wider families of closely related encodings, like one SHIFT_JIS test (confidence 0.99) turns into cp932 (confidence 0.34), or the whole family of tests that were determined as gb18030 (chinese) are now sometimes determined as gb2312 (the older subset of gb18030), and one even as cp1006, which AFAIK is just wrong.

As for performance claims, they appear not entirely false - analyzing all files took 20s, versus 150s with v6.0. However, looks like the library sometimes takes 2s to lazy initialize something, which means that if one uses `chardetect` CLI instead of Python API, you'll pay this cost each time and get several times slower instead.

Oh, and this "Negligible import memory (96 B)" is just silly and obviously wrong.


FWIW, I don't think there's even a room for interpretation here, given the commit that created the README (and almost all commits since the rewrite started 4 days ago) is authored by

> dan-blanchard and claude committed 4 days ago


Sure, I just could use a break from the needless side tracks.


AFAIK the .fla format was never fully documented or reverse engineered by anyone (FFDEC has an exporter, but not importer), so this alone would be a bold claim.


https://ruffle.rs/ is pretty solid


I'm talking about the .fla (XFL) format, not .swf (which is documented well - though that doesn't mean its exact behavior its understood well)

(note: I'm one of Ruffle's maintainers)


ruffle is a player for the output format (swf), .fla is the authoring format


Hi, one of Ruffle maintainers here. AFAIK, we do have most of NetConnection API implemented; but direct socket connections are just impossible in browsers. The games should (hopefully) work and connect when run via the desktop player. We also implemented socket emulation in the browser via WebSockets, so they should also start working there if you put a WebSockify proxy on your server (no need to touch the game server code).


Hi! You have done amazing work, and I'm ever grateful to your team for keeping AS3 alive!

I used sockets in some of my multiplayer games, but that's not where I ran into problems with Ruffle. Since those games only upgraded to sockets after an initial HTTPS connection, I haven't even gotten to the point of trying sockets yet. I mainly just used NetConnection.connect() for routine API calls, not to open a socket. AFAIK .connect() didn't open a socket, although I guess it had some two-way capabilities with Flash Media Server, but that's not how I used it. I just used it to initialize the NetConnection instance with the URI of a server endpoint that could receive AMF messages (usually translated on the backend with AMFPHP). I don't think it really left any sort of connection open. After that, you'd just make RESTful calls over that connection using netconnection.call(...args), and could send complex objects - even SQL result sets - back and forth without going through JSON or XML. But it was just a bunch of HTTP calls sending that data in Flash's own serialized format. You'd listen for NetStatusEvent or SecurityEvent to handle the results or errors. No sockets were involved. In conjunction with AMFPHP it was basically like a URLRequest without any structuring or destructuring needed to parse the results into AS3-friendly data types.

It would be amazing if only the RESTful kinds of NC connections and calls could work again through Ruffle, I think it might be all that's stopping my old games from running!


As a Ruffle developer who in my day job maintains some Flash-based websites, I'll note from experience that AMF serialization/deserialization in Ruffle has some definite issues, so that may be the issue for your games (the websites I maintain use https://metacpan.org/pod/AMF::Perl). See https://github.com/ruffle-rs/ruffle/issues?q=is:issue+state:....


As far as I've seen, Ruffle never even makes the call out to the server... so at this point I don't think it's a serialization issue although some of what's in that list could potentially cause problems. The Ruffle compatability docs still say that NetConnection has 90% coverage... except for the .connect() call itself, which kinda makes me wonder why bother covering it at all?

https://ruffle.rs/compatibility/avm2


That documentation, for stubs, can be somewhat misleading. It just looks for the presence of an avm2_stub_method function call anywhere in the method, which may mean a method that's entirely a stub, or as is the case for NetConnection.connect, a method that is stubbed under specific conditions. NetConnection.connect is stubbed for specifically non-null, non-http commands (generally this is RMTP/RTMFP). See https://github.com/ruffle-rs/ruffle/blob/df11c2206bc6be0a329...


Yeah, it's weird but I have an initial API call near the start of a program that makes a NetConnection.call() to an http address. The program should not run at all beyond that point until it gets a result from the server, after which it initializes a bunch of client-side variables and starts the main loop. With Ruffle, I see nothing go over the wire to that http address, but it's as if the client does get a result, because the rest of the program proceeds through the function defined in the Responder and onto the main loop. But it does so as if the server returned an undefined value, so then it just starts throwing errors related to those master values being undefined. Unfortunately there was no error-checking on the client side for that call; it assumed it either got some values or it failed to connect.

Maybe for some reason Ruffle thinks it's not a plain http call. I can start a GH issue if it would help.


Yes please open a GitHub issue and attach all materials needed to reproduce the issue. Thank you!


I am jolted, nearly shocked, that in 2026 you have to maintain some Flash-based websites. Can you share?


I mean I could decommission them but they're educational websites related to DNA and bioinformatics with interactive animations and my boss has a certain fondness for keeping them running if we can, as we used to get a number of grants that funded creating them in the first place as a nonprofit educational and research institution.


> And no-one cares.

Probably because there's no (public) disclosure and no CVE. From what I've googled, there's literally nothing about this aside from the tweet.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: