There’s been a large discussion on freebsd-stable@ as of late regarding how to hot-swap SATA hard disks on FreeBSD which are part of a ZFS pool. I wanted to take the opportunity to demonstrate how this works on real server-class hardware.
We have a system that contains 3 SATA drives — one SSD (for the OS, using standard UFS2 and isn’t part of the picture), and two 1TB drives (in a ZFS mirror). The system contains only 4 hot-swap drive bays, 3 of which are populated.
Without powering the system off, we want to replace the two existing 1TB drives with newer 1TB drives (which have more cache and offer SATA 3.x support, although the controller we’re using only supports up to SATA 2.x).
Those who haven’t read about my 8.0-RC1 experience should do so first:
Basically, my experience with 8.0-RC2 was identical to that of RC1, except some of the bugs/issues I experienced are now gone (hooray!). Fixes/improvements which I noticed are below:
For many years now I’ve been dealing with an ongoing issue which still to this day has no real solution: classic UNIX mailboxes (called mbox) comparing the files’ mtime to its atime to determine if there’s new mail inside of the mailbox (if the mtime is greater than the atime, there’s new mail. If the mtime is smaller than the atime, new mail has been read/there is no new mail). “Classic mail spools” (e.g.
/var/spool/mail) are mbox.
Why is this a problem? Because those of us who use mutt/alpine/etc. on our UNIX machines, who also do backups using things like
tar/cp/rsync (more on rsync in a moment) end up with mailboxes with a lost/clobbered atime after the backup takes place. The end result: our mail clients no longer tell us there’s new mail in that mailbox, which can be detrimental in many respects.
The most common rebuttal is “shut up and use Maildir“. What Maildir advocates don’t care to acknowledge is that there are many problems with the Maildir concept, particularly when used on a filesystem like ZFS. With classic mbox, your multi-megabyte mailboxes loads quickly — but with Maildir, since it uses a single file per mail, the end result is a mail client that takes forever to load due to the one-file-per-mail concept. ZFS does not perform well when it comes to massive numbers of small/terse files.
EDIT: Those interested in the upcoming release of FreeBSD 8.0 should read both the below, as well as my Testing out FreeBSD 8.0-RC2 post (which notes that many, but not all, of these problems have been fixed).
Yesterday I took the plunge and upgraded my home FreeBSD amd64 box from RELENG_7 to FreeBSD 8.0-RC1, which going forward I will refer to as RELENG_8 (yes, it has been tagged!). I did a complete reinstall, like I always do when migrating between major FreeBSD releases. Said box consists of a Supermicro X7SBA motherboard, Intel Core2Duo E8400 CPU, 4GB of RAM, and 3 SATA disks connected via the on-board Intel ICH9 + AHCI — one for the OS, and two in a ZFS mirror pool. Relevant
There’s an “age old question” that has been floating around with regards to ZFS on FreeBSD — is it stable? “Stable” in this case means: do I risk losing my data, will it cause kernel panics or other oddities, and do I need to tune it?
The answer, still, may be yes.
I’ve taken the initiative — that is to say, get an official response to these type of questions, specifically with regards to kernel panics. I’m incredibly surprised no one — not even the user community — has responded at this point. It’s not a trick question either; FreeBSD users really do need an answer to this.
People are continually comparing FreeBSD’s ZFS to that of Solaris 10 and OpenSolaris’ ZFS. Given that my day job involves heavy use of Solaris 10 on massive numbers of servers across the United States, I can safely say without a doubt ZFS on Solaris behaves better and won’t crash your system due to kernel memory exhaustion.
I just saw this commit come through for RELENG_7:
Add delta 220.127.116.11 2009.07.01.12.44.23 avg
The CVS commit log indicates this fixes a bug where NFS is being used on ZFS v13 exported filesystems, and the system mounting the NFS share attempts to
open(2) with flags O_CREAT and O_EXCL set. The file is created — 0 bytes in size, with mode 0000 — yet the operation returns EIO. This is pretty major:
I also enjoyed reading PR 135412 for this bug, where some developers made some amazing statements (my favourite being “use UFS2 instead of ZFS, use cp/rm instead of mv, don’t use NFS”). This further validates my concern that there isn’t enough QA being done prior to code being committed to the STABLE branches. I guess no one tested ZFS v13 filesystems being exported via NFS prior to the v13 commit?
Believe me, I’m thankful that ZFS v13 is now part of STABLE — sincerely I am — but my concern isn’t limited to ZFS: it applies to FreeBSD as a whole.
I’ve been trying to work out the source of “strange” performance-related issues when using ZFS on FreeBSD. The situation is reproducible, but not always, and it seems the complex inner nature of ZFS is probably contributing to the problem.
The problem I’m trying to track down is what causes sudden fread() operations, with 4KB reads, to suddenly start taking an absurd amount of time (0.2 seconds, for example), on a ZFS raidz1 pool with 4 disks all capable of operating at 80-100MB/sec (read and write).
My understanding is that the slowness is expected initially as ZFS’s ZIL and ARC do not have any reference to this data the first time around. And that’s fine — not to mention, seems true. For example, today I performed the following test: