Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In short: Compatible with existing Spark jobs but executing them much faster. Benchmarks in the README file and docs [1] show improvements up to 3x while not even all operations are implemented yet (i.e. if an operation is not available in Comet it falls back to Spark), so there is room for further improvements. Across all TPC-H queries the total speedup is currently 1.5x, the docs state that based on datafusion's standalone performance 2x-4x is a realistic goal [1]

Haven't seen any memory consumption benchmarks but suspect that it's lower than Spark for same jobs since datafusion is designsd from the ground up to be columnar-first.

For companies spending 100s of thousands if not millions on compute this would mean substantial savings with little effort.

[1] https://datafusion.apache.org/comet/contributor-guide/benchm...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: