FreeBSD and ZFS — horrible raidz1 read speed

I’ve been noticing what appears to be absolutely horrible speeds from a ZFS raidz1 pool, but only in some circumstances — specifically, when mutt and header caching (for Maildir) is used.

The setup is as follows:

ad4: 190782MB <WDC WD2000JD-00HBB0 08.02D08> at ata2-master SATA150
ad8: 715404MB <WDC WD7501AALS-00J7B0 05.00K05> at ata4-master SATA300
ad10: 715404MB <WDC WD7501AALS-00J7B0 05.00K05> at ata5-master SATA300
ad12: 715404MB <WDC WD7500AACS-00D6B0 01.01A01> at ata6-master SATA300
ad14: 715404MB <WDC WD7500AACS-00D6B0 01.01A01> at ata7-master SATA300

        NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad8     ONLINE       0     0     0
            ad10    ONLINE       0     0     0
            ad12    ONLINE       0     0     0
            ad14    ONLINE       0     0     0

Filesystem   1024-blocks      Used      Avail Capacity  Mounted on
storage/home    67108864     95232   67013632     0%    /home
storage       2149507840 227146240 1922361600    11%    /storage
/dev/ad4s1e      8122126     52450    7419906     1%    /tmp

This indicates that /home (e.g. /home/jdc/Maildir) lives on the ZFS pool storage, consisting of four (4) SATA300 disks. The UFS2-based stuff (OS disk, etc.) consists of a single SATA150 disk.

Now for relevant bits from my muttrc:

set mbox_type=Maildir
set folder="~/Maildir"
set mbox="~/Maildir"
set spoolfile="~/Maildir"
set header_cache="~/Maildir/header_cache.db"
set maildir_header_cache_verify=no

Now for the tests. First, we populate the Maildir header cache:

$ rm ~/Maildir/header_cache.db
$ mutt -f ~/Maildir/system
$ ls -l ~/Maildir/header_cache.db
-rw-------    1 jdc       users     671744  1 Jun 20:10 /home/jdc/Maildir/header_cache.db

Next, we copy the contents of ~/Maildir/system (which is on the ZFS pool) to /tmp/Maildir/system (which is UFS2):

$ rsync -a ~/Maildir/system /tmp/Maildir/

mutt will use ~/Maildir/header_cache.db no matter what we pass to the -f flag (see the above muttrc).

And now it’s time to prove my statement.

$ time mutt -f ~/Maildir/system

real    0m3.447s
user    0m0.030s
sys     0m0.022s
$ time mutt -f /tmp/Maildir/system

real    0m0.233s
user    0m0.013s
sys     0m0.022s

The only ZFS-related tunable I’ve set is vfs.zfs.prefetch_disable="1".

I’m really not sure what to make of this — we’re talking about a common operation that’s 17 times slower when using ZFS vs. UFS2. ZFS’s ARC infrastructure should already have the contents of this stuff in memory, so I’m not sure where the delays are coming from. I don’t think it’s ZIL-related, since that’s supposed to be used for writes. Surely raidz1 isn’t *that* slow…?

And before anyone claims my disks are slow or responsible for the problem…

# dd if=/dev/ad4 of=/dev/null bs=64k
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    1    733    733  46883    1.4      0      0    0.0   99.2| ad4

# dd if=/dev/ad8 of=/dev/null bs=64k
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    1   1659   1659 106156    0.6      0      0    0.0   97.0| ad8

# dd if=/dev/ad10 of=/dev/null bs=64k
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    1   1739   1739 111267    0.6      0      0    0.0   96.8| ad10

# dd if=/dev/ad12 of=/dev/null bs=64k
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    1   1461   1461  93510    0.7      0      0    0.0   98.4| ad12

# dd if=/dev/ad14 of=/dev/null bs=64k
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    1   1375   1375  88017    0.7      0      0    0.0   98.3| ad14

They’re not, as can be seen from sequential I/O and gstat(8) output.

I’m willing to bet if I bust out ktrace(1) on this stuff, the delays seen will be on read(2) calls.