Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I did early ZFSOnLinux development on hardware that did not have ECC memory. I once had a situation where a bit flip happened in the ARC buffer for libpython.so and all python software started crashing. Initially, I thought I had hit some sort of blizzard bug in ZFS, so I started debugging. At that time, opening a ZFS snapshot would fetch a duplicate from disk into a redundant ARC buffer, so while debugging, I ran cmp on libpython.so between the live copy and a snapshot copy. It showed the exact bit that had flipped. After seeing that and convincing myself the bitflip was not actually on stable storage, I did a reboot, and all was well. Soon afterward, I got a new development machine that had ECC so that I would not waste my time chasing phantom bugs caused by bit flips.


erhm, isnt zfs supposed to store the checksum of records stored in the arc, and verify on read? are you sure this is what happened?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: