More

simicd · 2026-02-21T10:22:55 1771669375

At 4.8TB one could add a header section with the full code, instructions how to compile it etc. That would certainly help to reproduce it, assuming civilizations in 10k years still can decypher todays language.

hulitu · 2026-02-23T11:19:55 1771845595

They will only need the right version of the rust compiler.

simicd · 2025-06-26T20:16:59 1750969019

It's correct that the number of reinsurers is smaller than that of primary insurers. But the risk born by reinsurers is less correlated, not more. Any given primary insurer has risk clusters (domestic market, line of business, etc.). If a large catastrophe happens in their domestic market they might go bust but what are the chances that it happens simultaneously to all markets globally?

Say you're a primary home insurer in the US. If a hurricane hits you might not have enough capital to rebuild all the homes. A reinsurer which is also covering Europe, Asia, LatAm, etc. is less likely to go bankrupt. The reinsurer can cross-subsidize and use the insurance premiums from other regions to pay out the claims from the US market. All that matters is that on average the loss probabilities and severities are estimated correctly.

And this is just using one line of business as example, reinsurers are covering property, casualty, life and health which add extra layers of diversification.

simicd · on Feb 6, 2025

Since it is a fork of VS Code you can install any VS Code extension in Cursor (although manually): https://www.cursor.com/how-to-install-extension

simicd · on Dec 15, 2024

From what I understood the article refers to the point that DuckDB doesn't provide its own dataframe API, meaning a way to express SQL queries in Python classes/functions.

The link you shared shows how DuckDB can run SQL queries on a pandas dataframe (e.g. `duckdb.query("<SQL query>")`. The SQL query in this case is a string. A dataframe API would allow you to write it completely in Python. An example for this would be polars dataframes (`df.select(pl.col("...").alias("...")).filter(pl.col("...") > x)`).

Dataframe APIs benefit from autocompletion, error handling, syntax highlighting, etc. that the SQL strings wouldn't. Please let me know if I missed something from the blog post you linked!

mwc360 · on Dec 16, 2024

Author here: that’s exactly what I was trying to communicate but you said it better :)

cmdlineluser · on Dec 16, 2024

There is a Spark API[1] being built using their Relational API[2].

Progress is being tracked on Github Discussions[3].

[1]: https://duckdb.org/docs/api/python/spark_api.html

[2]: https://duckdb.org/docs/api/python/relational_api.html

[3]: https://github.com/duckdb/duckdb/discussions/14525

mwc360 · on Dec 16, 2024

Very cool! This seems like fantastic functionality and would make it super easy to migrate small Spark workloads to DuckDB :)

adsharma · on Dec 16, 2024

For non trivial queries I write them in a separate SQL file where I get the benefit of syntax highlighting, auto formatting and error checks.

There may be another benefit: a lot of LLMs are getting good at how do I do X in Duckdb.

adsharma · on Dec 16, 2024

Your point about SQL strings vs more strongly typed DF APIs stands.

However it's somewhat weakened by the possibility that some parts of the SQL string are resolved by the surrounding python context.

simicd · on Dec 9, 2024

Hmm enough capacity for the rest of the world but not EU:

https://help.openai.com/en/articles/10250692-sora-supported-...

simicd · on Dec 9, 2024

And in the UK and Switzerland unfortunately

https://help.openai.com/en/articles/10250692-sora-supported-...

andz · on Dec 10, 2024

is it a regulatory issue?

simicd · on Dec 10, 2024

I suspect so, OpenAI is subject to the EU AI Act [0]. Last time they released the Advanced Voice Mode it also took some time before it became available in the EU. Not sure why UK and Switzerland are delayed as well, they are not in the European Union.

[0] https://openai.com/global-affairs/a-primer-on-the-eu-ai-act/

simicd · on Nov 16, 2024

I'm using both Spark and polars, to me the appeal of polars is additionally it is also much faster and easier to set up.

Spark is great if you have large datasets since you can easily scale as you said. But if the dataset is small-ish (<50 million rows) you hit a lower bound in Spark in terms of how fast the job can run. Even if the job is super simple it take 1-2 minutes. Polars on the other hand is almost instantaneous (< 1 second). Doesn't sound like much but to me makes a huge difference when iterating on solutions.

simicd · on July 1, 2024

Yes only found the announcement [1] that the Polars team and NVIDIA engineers are working on a GPU engine, but other than that no concrete examples. Github issues also don't provide any hints on the status, only one open item [2] where most comments are prior to the announcement.

[1] https://pola.rs/posts/polars-on-gpu/

[2] https://github.com/pola-rs/polars/issues/13111

simicd · on June 16, 2024

To add to that, an additional benefit would be you can compile and release it as Python package (Py03/maturin) or compile to WASM so it runs in the browser (with javascript bindings). This makes the code portable while benefiting from Rust's performance/memory safety.

simicd · on May 31, 2024

In short: Compatible with existing Spark jobs but executing them much faster. Benchmarks in the README file and docs [1] show improvements up to 3x while not even all operations are implemented yet (i.e. if an operation is not available in Comet it falls back to Spark), so there is room for further improvements. Across all TPC-H queries the total speedup is currently 1.5x, the docs state that based on datafusion's standalone performance 2x-4x is a realistic goal [1]

Haven't seen any memory consumption benchmarks but suspect that it's lower than Spark for same jobs since datafusion is designsd from the ground up to be columnar-first.

For companies spending 100s of thousands if not millions on compute this would mean substantial savings with little effort.

[1] https://datafusion.apache.org/comet/contributor-guide/benchm...