arshajii's comments

arshajii · on Dec 9, 2022

Thanks a lot for all the comments and feedback! Wanted to add a couple points/clarifications:

- Codon is a completely standalone (from CPython) compiler that was started with the goal of statically compiling as much Python code as possible, particularly for scientific computing use cases. We're working on closing the gap further both in what we can statically compile, and by automatically falling back to CPython in cases we can't handle. Some of the examples brought up here are actually in the process of being supported via e.g. union types, which we just added (https://docs.exaloop.io/codon/general/releases).

- You can actually use any plain Python library in Codon (TensorFlow, matplotlib, etc.) — see https://docs.exaloop.io/codon/interoperability/python. The library code will run through Python though, and won't be compiled by Codon. (We are working on a Codon-native NumPy implementation with NumPy-specific compiler optimizations, and might do the same for other popular libraries.)

- We already use Codon and its compiler/DSL framework to build quite a few high-performance scientific DSLs. For example, Seq for bioinformatics (the original motivation for Codon), and others are coming out soon.

Hope you're able to give Codon a try and looking forward to further feedback and suggestions!

hathawsh · on Dec 9, 2022

I can see myself using Codon for projects in the future. One thing that concerns me, though, is "automatically falling back to CPython in cases we can't handle". Sometimes, I want the compilation to fail rather than fall back, because sometimes consistent high speed is a requirement. Please keep that in mind as you design that part.

Great work so far!

nhumrich · on Dec 9, 2022

Im curious. If codon can compile a python script, why can it not compile a pure python library?

What technical limitations does an import or 3rd party add that a script wouldn't have?

LoganDark · on Dec 9, 2022

NumPy, PyTorch, TensorFlow and many other widely known third-party libraries are actually native code that interact with CPython directly.

taylorius · on Dec 9, 2022

I'm very interested to try Codon, though I note there are no Windows binaries. Do you think building from source on Windows would be straightforward?

victoryhb · on Dec 9, 2022

Excellent job. I can already see this being much more flexible than Numba and much more elegant/easy to use than Cython. Please keep it coming:)

arshajii · on Dec 8, 2022

I would guess the bulk of the time is being spent in compilation. You might try "codon build -release day_2.py" then "time ./day_2" to measure just runtime.

memco · on Dec 9, 2022

Good catch! Here's updated runs:

   time python day_2.py

   ________________________________________________________
   Executed in   51.26 millis    fish           external
      usr time   23.38 millis   48.00 micros   23.33 millis
      sys time   21.88 millis  617.00 micros   21.26 millis

   time day_2

   ________________________________________________________
   Executed in  227.06 millis    fish           external
      usr time    8.17 millis   70.00 micros    8.10 millis
      sys time    6.69 millis  708.00 micros    5.98 millis

   time python day_8.py

   ________________________________________________________
   Executed in   53.63 millis    fish           external
      usr time   22.11 millis   51.00 micros   22.06 millis
      sys time   24.63 millis  714.00 micros   23.91 millis

   time day_8

   ________________________________________________________
   Executed in  115.89 millis    fish           external
      usr time    5.83 millis   92.00 micros    5.74 millis
      sys time    4.59 millis  856.00 micros    3.73 millis

Now codon is much faster than Python.

arshajii · on Dec 8, 2022

Some info on that at https://docs.exaloop.io/codon/general/faq#how-does-codon-com......

williamstein · on Dec 8, 2022

Their benchmarks (https://exaloop.io/benchmarks) show that Codon is much, much faster than pypy. I also just tried some microbenchmarks with their fib example (iterated many times with higher parameters) and got similar results. It's unfortunate for now that this isn't open source, but it's really valuable to demonstrate to us Python lovers what's possible using LLVM!

cogman10 · on Dec 8, 2022

Their benchmarks are not to be trusted (after reading the source).

- They cheat, they rewrite code to use coden specific features to "win" (ie, parallelization and GPU optimizations)

- They don't warm up. They are simply running their competition directly rather than allowing any sort of warmup. (In other words, they are measuring cold boot and startup time)

Now, if they want to argue about startup time or whatever mattering for performance then fine. However, the representation of "20x faster!" is simply deception.

TBF, they are upfront about cheating

> Further, some of the benchmarks are identical in both Python and Codon, some are changed slightly to work with Codon's type system, and some use Codon-specific features like parallelism or GPU.

williamstein · on Dec 8, 2022

Thanks for doing the work to point all of this out. "Benchmarketing".

arshajii · on Dec 8, 2022

We do have a benchmark suite at https://github.com/exaloop/codon/tree/develop/bench and results on a couple different architectures at https://exaloop.io/benchmarks

_aavaa_ · on Dec 8, 2022

Why are do the C++ implementations perform so poorly?

camel-cdr · on Dec 8, 2022

My guess for word_count and faq is that the C++ implementation uses std::unordered_map, which famously has quite poor performance. [0]

[0] https://martin.ankerl.com/2019/04/01/hashmap-benchmarks-01-o...

arshajii · on Sept 15, 2021

(I'm one of the developers on Seq.) We've actually been working mostly on closing the gap with Python for the last year or so. Seq can be useful for plain Python programs as well -- I give a bit more context in my comment above.

arshajii · on Sept 15, 2021

Hi everyone, I’m one of the developers on the Seq project — I was delighted to see it posted here! We started this project with a focus on bioinformatics, but since then we’ve added a lot of language features/libraries that have closed the gap with Python by a decent margin, and Seq today can be useful in other areas or even for general Python programs (although there are still limitations of course). We’re in the process of creating an extensible / plugin-able Python compiler based on Seq that allow for other domain-extensions. The upcoming release also has some neat features like OpenMP integration (e.g. “@par(num_threads=10) for i in range(N): …” will run the loop with 10 threads). Happy to answer any questions!

adgjlsfhk1 · on Sept 15, 2021

Have follow-up benchmarks vs BioJulia been done since 2019? If I remember correctly at the time, the result was that BioJulia was faster once you consider that it did validation.

arshajii · on Sept 15, 2021

We haven't done too many comparisons with BioJulia since that paper, although we did address the (valid) issues they raised such as data validation (i.e. Seq now validates input data by default, but this can be optionally disabled). We did compare against them in our last paper in a sequence alignment benchmark: https://www.nature.com/articles/s41587-021-00985-6 (check the supplement).