Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wonder what this means for practical system design. Do people currently build assumptions about hard drive failure patterns into their systems, in a way that they should change? I suppose independent failure (i.e. copying data to two drives is better than storing it on just one) is the main assumption behind e.g. RAID; I wonder whether Google has any new insight there.


You should be able to improve over naive RAID by pairing a relatively-high-probability-of-failure drive with a low prob one. i.e. what you *shouldn't* do is the common practice of putting two new drives in a mirror, since they are both in the infant mortality part of the failure curve. What this data suggests is that you'll get a smaller chance of losing data (via simultaneous failure) if you pair a new drive with an older "proven" one (but not one so old that it is nearing end of life).


infant mortality is practically non-existent for enterprise-class drives and rare for consumer-class drives.

there is something to gain by using drives from different manufacturers (or different lots from the same manufacturer) within an array.


Companies would be interested in saving on cooling costs if its not providing any significant benefit.


yes, the classic RAID paper assumes that faults are independent. this is not the case.

some recent work extends the basic analysis to deal with correlated faults.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: