You're right that dynamic typing makes high-frequency execution tricky, and modern OoO cores are incredibly good at hiding latencies.
But PyXL isn't trying to replace general-purpose CPUs — it's designed for efficient, predictable execution in embedded and real-time systems, where simplicity and determinism matter more than absolute throughput.
Most embedded cores (like ARM Cortex-M and simple RISC-V) are in-order too — and deliver huge value by focusing on predictability and power efficiency.
That said, there’s room for smart optimizations even in a simple core — like limited lookahead on types, hazard detection, and other techniques to smooth execution paths.
I think embedded and real-time represent the purest core of the architecture — and once that's solid, there's a lot of room to iterate upward for higher-end acceleration later.
Java is statically typed and a lot saner than Python, and JavaCard is a fairly restricted subset. Apparently real cards don't typically support garbage collection.
IMO JavaCard doesn't really make sense either. There's clearly space for another language here, though I suspect most people would much rather just use Rust than learn a new language.
Good question.
In theory, you can compile anything Turing-complete to anything else — ARM and Python are both Turing-complete.
But practically, Python's model (dynamic typing, deep use of the stack) doesn't map cleanly onto ARM's register-based, statically-typed instruction set.
PySM is designed to match Python’s structure much more naturally — it keeps the system efficient, simpler to pipeline, and avoids needing lots of extra translation layers.
I'm a software engineer by background, mostly in high-frequency trading (HFT), HPC, systems programming, and networking — so a lot of focus on efficiency and low-level behavior.
I had played a bit with FPGAs before, but nothing close to this scale — most of the hardware and Python internals work I had to figure out along the way.
Right now, the plan is to present it at PyCon first (next month) and then publish more about the internals afterward.
Long-term, I'm keeping an open mind, not sure yet.
My background is in high-frequency trading (HFT), high-performance computing (HPC), systems programming, and networking.
I didn't come from HW background — or at least, I wasn't when I started — but coming from the software side gave me a different perspective on how dynamic languages could be made much more efficient at the hardware level.
Difficult - adapting the Python execution model to my needs in a way that keeps it self-coherent if it makes sense. This is still fluid and not finalized...
Easy - Not sure if categorize as easy, but more surprising: The current implementation is rather simple and elegant (at least I think so :-) ), so still no special advanced CPU design stuff (branch prediction, super-scalar, etc).
So even now, I'm getting a huge improvement over CPython or MicroPython VMs in the known python bottlenecks (branchings, function calls, etc)
Difficult - adapting the Python execution model to my needs in a way that keeps it self-coherent if it makes sense. This is still fluid and not finalized...
Alright well those dots are begging me to ask what they mean, or at least one specific story for the nerds :-)
Long-term, I'm keeping an open mind, not sure yet.
Well please consider open source, even if you charge for access to your open source code. And even if you don't go open source, atleast make it cheap enough that a solo developer could afford to build on it without thinking.
Not yet — I'm currently testing on a Zynq-7000 platform (embedded-class FPGA), mainly because it has an ARM CPU tightly integrated (and it's rather cheap).
I use the ARM side to handle IO and orchestration, which let me focus the FPGA fabric purely on the Python execution core, without having to build all the peripherals from scratch at this stage.
To run PyXL on a server-class FPGA (like Azure instances), some adaptations would be needed — the system would need to repurpose the host CPU to act as the orchestrator, handling memory, IO, etc.
The question is: what's the actual use case of running on a server? Besides testing max frequency -- for which I could just run Vivado on a different target (would need license for it though)
For now, I'm focusing on validating the core architecture, not just chasing raw clock speeds.
This is still an early-stage project — it's not completed yet, and fabricating a custom chip would involve huge costs.
I'm a solo developer worked on this in my spare time, so FPGA was the most practical way to prove the core concepts and validate the architecture.
Longer term, I definitely see ASIC fabrication as the way to unlock PyXL’s full potential — but only once the use case is clear and the design is a little more mature.
Oh, my comment wasn't meant as a criticism just curiosity because I would have been extremely surprised to see such a project being fabricated.
I find the idea of a processor designed for a specific very high level language quite interesting. What made you choose python and do you think it's the "correct" language for such a project? It sure seems convenient as a language but I wouldn't have thought it is best suited for that task due to the very dynamic nature of it. Perhaps something like Nim which is similar but a little less dynamic would be a better choice?
Im not super versed in hardware, but whats the reason you can't adapt this to run on an ARM microprocessor chip? Why go with FPGA?
Like if I could buy a Cortex board and write Python, hit compile, and have the thing run, this would be INSANELY useful to me, cause cortex chips have pretty great A/D converters for sensing.
There are definitely some limitations beyond just memory or OS interaction.
Right now, PyXL supports a subset of real Python. Many features from CPython are not implemented yet — this early version is mainly to show that it's possible to run Python efficiently in hardware.
I'd prefer to move forward based on clear use cases, rather than trying to reimplement everything blindly.
Also, some features (like heavy runtime reflection, dynamic loading, etc.) would probably never be supported, at least not in the traditional way, because the focus is on embedded and real-time applications.
As for the design process — I’d love to share more!
I'm a bit overwhelmed at the moment preparing for PyCon, but I plan to post a more detailed blog post about the design and philosophy on my website after the conference.
In terms of a feature-set to target, would it make sense to be going after RPython instead of "real" Python? Doing that would let you leverage all the work that PyPy has done on separating what are the essential primitives required to make a Python vs what are the sugar and abstractions that make it familiar:
> I'd prefer to move forward based on clear use cases
Taking the concrete example of the `struct` module as a use-case, I'm curious if you have a plan for it and similar modules. The tricky part of course is that it is implemented in C.
Would you have to rewrite those stdlib modules in pure python?
Peripheral drivers (like UART, SPI, etc.) are definitely on the roadmap - They'd obviously be implemented in HW.
You're absolutely right — once you have basic IO, you can make things like print() and filesystem access feel natural.
Regarding limitations: you're right again.
PyXL currently focuses on running a subset of real Python — just enough to show it's real python and to prove the core concept, while keeping the system small and efficient for hardware execution.
I'm intentionally holding off on implementing higher-level features until there's a real use case, because embedded needs can vary a lot, and I want to keep the system tight and purpose-driven.
Also, some features (like threads, heavy runtime reflection, etc.) will likely never be supported — at least not in the traditional way — because PyXL is fundamentally aimed at embedded and real-time applications, where simplicity and determinism matter most.
Are you planning on licensing the IP core? It would be great to have your core integrated with ESP32, running alongside their other architectures, so they can handle the peripheral integration, wifi, and Python code loading into your core, while it sits as another master on the same bus as the other peripherals.
That's a great question!
I actually thought a lot about that early on.
In theory, you could build a CPU that directly interprets Python bytecode — but Python bytecode is quite high-level and irregular compared to typical CPU instructions.
It would add a lot of complexity and make pipelining much harder, which would hurt performance, especially for real-time or embedded use.
By compiling the Python bytecode ahead of time into a simpler, stack-based ISA (what I call PySM), the CPU can stay clean, highly pipelined, and efficient.
It also opens the door in the future to potentially supporting other languages that could target the same ISA!
You're right that dynamic typing makes high-frequency execution tricky, and modern OoO cores are incredibly good at hiding latencies. But PyXL isn't trying to replace general-purpose CPUs — it's designed for efficient, predictable execution in embedded and real-time systems, where simplicity and determinism matter more than absolute throughput. Most embedded cores (like ARM Cortex-M and simple RISC-V) are in-order too — and deliver huge value by focusing on predictability and power efficiency. That said, there’s room for smart optimizations even in a simple core — like limited lookahead on types, hazard detection, and other techniques to smooth execution paths. I think embedded and real-time represent the purest core of the architecture — and once that's solid, there's a lot of room to iterate upward for higher-end acceleration later.