More

kernelsanderz · 2026-02-13T02:23:31 1770949411

Theo’s snitch bench is a good data driven benchmark on this type of behavior. But in fairness the models are prompted to be bold to take actions. And doesn’t necessarily represent out of the box or models deployed in a user facing platform.

https://snitchbench.t3.gg/

kernelsanderz · 2025-07-30T20:47:49 1753908469

https://archive.is/vXfbb

kernelsanderz · 2025-07-15T13:53:12 1752587592

I’ve been excited about lancedb and its ability to support vector indexes and efficient row level lookups. I wonder if this approach would work for their design goals and still allow broader backwards compatibility with the parquet ecosystem. Have been intrigued by Ducklake, and they’ve leaned into parquet. Perhaps this approach will allow more flexible indexing approaches with support for the broader parquet ecosystem which is significant.

kernelsanderz · 2025-06-20T23:31:21 1750462281

Marimo is really special and solves most of the problems that you have with Jupyter. For those Marimo curious I strongly recommend checking out their YouTube channel. So much effort gone into making these videos really great. https://youtube.com/@marimo-team?si=ZGaf8Zgq5WN3LKRg

kernelsanderz · 2025-06-07T04:03:05 1749268985

I’ll read this tomorrow

kernelsanderz · on Feb 24, 2025

For another library that has great performance and features like full text indexing and the ability to version changes I’d recommend lancedb https://lancedb.github.io/lancedb/

Yes, it’s a vector database and has more complexity. But you can use it without creating indexes and it has excellent polars and pandas zero copy arrow support also.

daveguy · on Feb 24, 2025

Since a lot of ML data is stored as parquet, I found this to be a useful tidbit from lancedb's documentation:

> Data storage is columnar and is interoperable with other columnar formats (such as Parquet) via Arrow

https://lancedb.github.io/lancedb/concepts/data_management/

Edit: That said, I am personally a fan of parquet, arrow, and ibis. So many data wrangling options out there it's easy to get analysis paralysis.

esafak · on Feb 24, 2025

Lance is made for this stuff; parquet is not.

3abiton · on Feb 24, 2025

How well does it scale?

kernelsanderz · on Oct 20, 2024

Also worth checking out https://github.com/jasonwhite/rudolfs

Been using it to store datasets via lfs. Written in rust and has been very reliable.

kernelsanderz · on Oct 20, 2024

I’ve been using https://github.com/jasonwhite/rudolfs - which is written in rust. It’s high performance but doesn’t have all the features (auth) that you might need.

kernelsanderz · on Sept 10, 2024

https://archive.is/S7UEJ

kernelsanderz · on Aug 28, 2024

https://archive.is/do6tG