FreeBSD and ZFS — more performance quirks

I’ve been trying to work out the source of “strange” performance-related issues when using ZFS on FreeBSD. The situation is reproducible, but not always, and it seems the complex inner nature of ZFS is probably contributing to the problem.

The problem I’m trying to track down is what causes sudden fread() operations, with 4KB reads, to suddenly start taking an absurd amount of time (0.2 seconds, for example), on a ZFS raidz1 pool with 4 disks all capable of operating at 80-100MB/sec (read and write).

My understanding is that the slowness is expected initially as ZFS’s ZIL and ARC do not have any reference to this data the first time around. And that’s fine — not to mention, seems true. For example, today I performed the following test:

  1. rm Maildir/header_cache.db
  2. mutt -y
  3. Ran through all of the folders, thus populating the Maildir header cache. A folder with 400 files in it would take maybe 10 full seconds. Then I exited mutt.
  4. Again, rm header_cache.db
  5. mutt -y
  6. Once more, ran through all the folders. This time things were much faster — almost instantaneous. So populating the cache was very fast/quick given that ZFS now had some of the files/whatever cached in the ARC, or possibly references to the system calls in the ZIL.

This is an ideal situation, and makes sense. The ARC has a tendency to grow very large as more and more I/O happens, and that’s fine — that’s the nature of the ZFS beast.

However, where things get bizarre is when the above situation “logical” scenario stops occurring — acting as if the ZFS ARC or ZIL is “full” and doesn’t choose to cache anything more for some particular reason. Again, I can reproduce this quite easily using the above procedure, but it’s not a very good real-world test to post to a mailing list. It’s getting to the point where I’m probably going to have to write some C code that mimics all of the scenarios possible. There is definitely a problem somewhere, and I think that’s the only way I’m going to get people to track it down.

But something today occurred to me while reading the Cache Flushes section of the ZFS Evil Tuning Guide. I was left wondering if setting the FreeBSD equivalent of zfs_nocacheflush would improve things.

So today I decided to set vfs.zfs.cache_flush_disable="1" in /boot/loader.conf to see how things performed.

Something tells me I’m probably going to have to create a ZFS-related WordPress Category in my blog just for this kind of thing. :-)