Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Reading between the lines, it sounds as if they're using mmap. There is no "append" operation on a memory mapping, so the file would need to be preallocated before mapping it.

If the preallocation is done using fallocate or just writing zeros, then by default it's backed by blocks on disk, and readahead must hit the disk since there is data there. On the other hand, preallocating with fallocate using FALLOC_FL_ZERO_RANGE or (often) with ftruncate() will just update the logical file length, and even if readahead is triggered it won't actually hit the disk.



For the file being entirely pre-allocated case I understand, but for the file hole case I'm not sure I understand why you'd get such high disk activity.

If the index block also got evicted from the page cache, then could reading into a file hole still trigger a fault? Or is the "holiness" of a page for a mapping stored in the page table?


I suspect page size/aligned file holes could be backed by a read-only zero page via PTE as an optimization, but they might not be (I'm not as familiar with Linux mmap/filesystems as with FreeBSD).

It is quite possible the filesystem caches, e.g., the file extent tree (including holiness) separately from the backing inode/on-disk sectors for the tree.


Using _ftruncate_ or FALLOC_FL_ZERO_RANGE is a bad idea for a database. The problem is that you may get an out of disk space error mid operation.

If you are using mmap, that will express itself as a segmentation fault, which you really don't want.

You _need_ to allocate the file ahead of time, so you can properly behave there.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: