Reading between the lines, it sounds as if they're using mmap. There is no "append" operation on a memory mapping, so the file would need to be preallocated before mapping it.
If the preallocation is done using fallocate or just writing zeros, then by default it's backed by blocks on disk, and readahead must hit the disk since there is data there. On the other hand, preallocating with fallocate using FALLOC_FL_ZERO_RANGE or (often) with ftruncate() will just update the logical file length, and even if readahead is triggered it won't actually hit the disk.
For the file being entirely pre-allocated case I understand, but for the file hole case I'm not sure I understand why you'd get such high disk activity.
If the index block also got evicted from the page cache, then could reading into a file hole still trigger a fault? Or is the "holiness" of a page for a mapping stored in the page table?
I suspect page size/aligned file holes could be backed by a read-only zero page via PTE as an optimization, but they might not be (I'm not as familiar with Linux mmap/filesystems as with FreeBSD).
It is quite possible the filesystem caches, e.g., the file extent tree (including holiness) separately from the backing inode/on-disk sectors for the tree.
If the preallocation is done using fallocate or just writing zeros, then by default it's backed by blocks on disk, and readahead must hit the disk since there is data there. On the other hand, preallocating with fallocate using FALLOC_FL_ZERO_RANGE or (often) with ftruncate() will just update the logical file length, and even if readahead is triggered it won't actually hit the disk.