Unrepairable data corruption on a raidz2 when all drives show zero errors -- HOW???
So just getting around to checking my logs on my backup server, and it says that I have a permanently damaged file that's un-repairable.
How is this even possible on a raidz2 volume where each member shows zero problems and no dead drives? Isn't that whole point of raidz2, so that if one (er, two) drives have a problem the data is recoverable? How can I figure out why this happened and why it was unrecoverable, and most importantly, prevent it in the future?
It's only my backup server and the original file is still A-OK, but I'm really concerned here!
zpool status -v:
3-2-1-backup@BackupServer:~$ sudo zpool status -v
pool: data_pool3
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 06:59:59 with 1 errors on Sun Nov 12 07:24:00 2023
config:
NAME STATE READ WRITE CKSUM
data_pool3 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
wwn-0x5000ccaxxxxxxxx1 ONLINE 0 0 0
wwn-0x5000ccaxxxxxxxx2 ONLINE 0 0 0
wwn-0x5000ccaxxxxxxxx3 ONLINE 0 0 0
wwn-0x5000ccaxxxxxxxx4 ONLINE 0 0 0
wwn-0x5000ccaxxxxxxxx5 ONLINE 0 0 0
wwn-0x5000ccaxxxxxxxx6 ONLINE 0 0 0
wwn-0x5000ccaxxxxxxxx7 ONLINE 0 0 0
wwn-0x5000ccaxxxxxxxx8 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
data_pool3/(redacted)/(redacted)@backup_script:/Documentaries/(redacted)
It's definitely not the recent ZFS bug that others mentioned here. Simply to the fact that when corruption occurs due to that bug it cannot be identified, the filesystem is consistent.