Hacker Newsnew | past | comments | ask | show | jobs | submit | kernelsanderz's commentslogin

Theo’s snitch bench is a good data driven benchmark on this type of behavior. But in fairness the models are prompted to be bold to take actions. And doesn’t necessarily represent out of the box or models deployed in a user facing platform.

https://snitchbench.t3.gg/



I’ve been excited about lancedb and its ability to support vector indexes and efficient row level lookups. I wonder if this approach would work for their design goals and still allow broader backwards compatibility with the parquet ecosystem. Have been intrigued by Ducklake, and they’ve leaned into parquet. Perhaps this approach will allow more flexible indexing approaches with support for the broader parquet ecosystem which is significant.


Marimo is really special and solves most of the problems that you have with Jupyter. For those Marimo curious I strongly recommend checking out their YouTube channel. So much effort gone into making these videos really great. https://youtube.com/@marimo-team?si=ZGaf8Zgq5WN3LKRg


I’ll read this tomorrow


For another library that has great performance and features like full text indexing and the ability to version changes I’d recommend lancedb https://lancedb.github.io/lancedb/

Yes, it’s a vector database and has more complexity. But you can use it without creating indexes and it has excellent polars and pandas zero copy arrow support also.


Since a lot of ML data is stored as parquet, I found this to be a useful tidbit from lancedb's documentation:

> Data storage is columnar and is interoperable with other columnar formats (such as Parquet) via Arrow

https://lancedb.github.io/lancedb/concepts/data_management/

Edit: That said, I am personally a fan of parquet, arrow, and ibis. So many data wrangling options out there it's easy to get analysis paralysis.


Lance is made for this stuff; parquet is not.


How well does it scale?


Also worth checking out https://github.com/jasonwhite/rudolfs

Been using it to store datasets via lfs. Written in rust and has been very reliable.


I’ve been using https://github.com/jasonwhite/rudolfs - which is written in rust. It’s high performance but doesn’t have all the features (auth) that you might need.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: