Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Efficient slicing happens because of "tiling", hence the name TileDB. A tile is similar to an HDF5 or Zarr "chunk", or more loosely to a Parquet page. Although totally configurable, tiling is handled solely by TileDB, the user doesn't need to know about it. A tile is the atomic unit of IO and compression. TileDB maintains all the necessary metadata and indexing built into its format and, given a query, it knows how to fetch only the tiles that might include results. The tiles are decompressed in your memory and filtered further for the actual results. The dense array case is rather straightforward. The sparse case is a big differentiator in TileDB and it is quite challenging, especially in the presence of updates. TileDB handles the sparse case via bulk-loaded R-trees for multi-dimensional indexing, and via an LSM-tree-like approach with immutable objects that allows time traveling.

Concerning your point on potential errors occurring on S3, this is addressed by TileDB's immutable object approach. If an error occurs upon some write, there will be no array corruption. Happy to discuss about this topic on a separate thread.

Some related docs:

https://docs.tiledb.com/main/performance-tips/choosing-tilin...

https://docs.tiledb.com/main/basic-concepts/tile-filters#til...

https://docs.tiledb.com/main/basic-concepts/definitions/frag...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: