Speaker: Garth Goodson
Network Appliance
Wednesday, May 16
12:00 pm - 1:00 pm
EBU3b 4140
ABSTRACT
The reliability measures in today's disk drive-based storage systems focus predominantly on
protecting against complete disk failures. Previous disk reliability studies have analyzed
empirical data in an attempt to better understand and predict disk failure rates. Yet, very
little is known about the incidence of latent sector errors, i.e., errors that go undetected
until the corresponding disk sectors are accessed.
Our study analyzes data collected from production storage systems over 32 months across 1.53
million disks (both nearline and enterprise class). We analyze factors that impact latent
sector errors, observe trends, and explore their implications on the design of reliability
mechanisms in storage systems.
This talk will describe our latent sector error data, the analysis we performed, and the
observations we drew from the data. Among others, the talk will discuss the temporal and
spatial locality of successive errors, how error rates change over time, and how effective
we believe disk scrubbing to be.
BIO
Garth Goodson is a member of the Advanced Development Group at Network Appliance (NetApp).
At NetApp, Garth has helped co-author the Parallel NFS (pNFS) specification which is a
part of the new NFSv4 minor version one draft. pNFS is an extension to provide data
parallelism directly to the client within the NFSv4 protocol. He is also interested in
virtual machines, software fault-isolation, and I/O performance. Prior to joining Network
Appliance in 2004, Garth received his Ph.D. from Carnegie Mellon University in Electrical
and Computer Engineering under Greg Ganger. For his thesis, Garth developed novel
Byzantine fault-tolerant protocols using erasure-coding schemes to provide consistency in
a distributed storage system (PASIS). He also received his MS and BS from CMU in 2000 and
1998 respectively. While at CMU he also worked on a versioning file system as part of the
Self-Securing Storage project (S4), and user-level networking.