This detailed paper “Hard disk drives, the good, the bad and the ugly” describes the many failure modes of modern disk drives. Jon Elerath outlines the many causes of silent data corruption, and the steps taken by HDD manufacturers to recover from transient errors. This is interesting from a performance perspective since the HDD recovery mechanism might show up as strange latencies on the read path.
Additionally silent data corruption can be latent in the system for a very long time, often only becoming visible when the corruption touches the meta-data of the filesystem that is created on top of it. Considering that filesystem meta data is a small fraction of the overall storage, there is always a chance that ‘real’ data has been destroyed if an fsck is required on a filesystem. One great feature of Oracle is (was) that you could instruct it to calculate its own block checksums every time a backup was made, however this adds time to the backup and is sometimes disabled.
I found this linked from Bryan Cantrils blog