Writing FreeBSD memstick.img to a USB drive in Windows

I’ve received some hits on my blog of people looking how to write the FreeBSD memstick.img image to a USB flash drive under Windows. The official FreeBSD procedure works great, but only applies if you already have access to a FreeBSD box. Accomplishing the same under Windows is more of a hassle, but not too much.

The solution: download John’s Newbigin’s dd for Windows. This is an enhanced version of dd which also lets you list off raw devices in Windows — including USB sticks.

In the below example, I have a 4GB USB flash drive (HP, model v100w) connected on a USB port, under Windows XP SP3. This is the drive I want 8.0-RC2-amd64-memstick.img written to. Bolded text is used for denoting commands I’ve typed, as well as the device strings associated with the USB flash drive:

C:\>dd --list
rawwrite dd for windows version 0.5.
Written by John Newbigin <jn@it.swin.edu.au>
This program is covered by the GPL.  See copying.txt for details
Win32 Available Volume Information
\\.\Volume{1ff1b266-ab71-11de-b1e8-806d6172696f}\
  link to \\?\Device\HarddiskVolume1
  fixed media
  Mounted on \\.\c:

\\.\Volume{808faa36-bdbc-11de-a116-806d6172696f}\
  link to \\?\Device\HarddiskVolume2
  fixed media
  Mounted on \\.\d:

\\.\Volume{1ff1b262-ab71-11de-b1e8-806d6172696f}\
  link to \\?\Device\CdRom0
  CD-ROM
  Mounted on \\.\e:

\\.\Volume{3794d0ff-abb4-11de-9377-00221578190a}\
  link to \\?\Device\CdRom1
  CD-ROM
  Mounted on \\.\f:

\\.\Volume{ec4923e1-c907-11de-a118-00221578190a}\
  link to \\?\Device\Harddisk1\DP(1)0-0+12
  removeable media
  Mounted on \\.\g:

NT Block Device Objects
\\?\Device\CdRom0
  size is 2147483647 bytes
\\?\Device\CdRom1
  size is 2147483647 bytes
\\?\Device\Harddisk0\Partition0
  link to \\?\Device\Harddisk0\DR0
  Fixed hard disk media. Block size = 512
  size is 300069052416 bytes
\\?\Device\Harddisk0\Partition1
  link to \\?\Device\HarddiskVolume1
\\?\Device\Harddisk0\Partition2
  link to \\?\Device\HarddiskVolume2
\\?\Device\Harddisk1\Partition0
  link to \\?\Device\Harddisk1\DR17
  Removable media other than floppy. Block size = 512
  size is 4009754624 bytes
\\?\Device\Harddisk1\Partition1
  link to \\?\Device\Harddisk1\DP(1)0-0+12
  Removable media other than floppy. Block size = 512
  size is 4009730048 bytes
...

The device string we want is the NT Block Device, not the Win32 Volume, and we’re interested in the Partition0 entry. Now that we know the device path, we can write memstick.img directly to that, using the exact same block size as what the official FreeBSD procedure recommends.

Note that the conv=sync parameter has been removed (not needed here, and this version of dd doesn’t understand it anyway), and I’ve added the --progress flag which indicates how many bytes have been written in real-time (useful).

Finally: please be sure you pick the correct device string! I won’t be held accountable if you screw this up and destroy your Windows machines’ hard disk. :-)

C:\>dd if=8.0-RC2-amd64-memstick.img of=\\?\Device\Harddisk1\Partition0 bs=10240 --progress
rawwrite dd for windows version 0.5.
Written by John Newbigin <jn@it.swin.edu.au>
This program is covered by the GPL.  See copying.txt for details
1,044,858,880
102037+0 records in
102037+0 records out

Voilà.

Testing out FreeBSD 8.0-RC2

Those who haven’t read about my 8.0-RC1 experience should do so first:

Basically, my experience with 8.0-RC2 was identical to that of RC1, except some of the bugs/issues I experienced are now gone (hooray!).

Fixes/improvements:

  • The issue I experienced with the Boot Manager selection phase of installation has been fixed. Also, Standard is now the default option (first choice).
  • The geometry does not match label problem has been addressed by fixing the FreeBSD slice editor in sysinstall/sade; see below.
  • The FreeBSD slice editor has been changed to properly work with GEOM. Commonly when installing FreeBSD on a box, people go into the slice editor and press “a” to use the entire disk. Previously, users would end up with a disk where the first 63 sectors were unused (probably for the PBR/MBR and overall alignment), then the FreeBSD slice, and a third “unused” portion of the disk (which, if I remember correctly, was done solely for alignment reasons). Example:
    Offset       Size(ST)        End     Name  PType       Desc  Subtype    Flags
    
             0         63         62        -     12     unused        0
            63  390716802  390716864    ad8s1      8    freebsd      165    A
     390716865       5103  390721967        -     12     unused        0
    

    Starting with RC2, this is what you’ll see:

    Offset       Size(ST)        End     Name  PType       Desc  Subtype    Flags
    
             0         63         62        -     12     unused        0
            63  390721905  390721967    ad8s1      8    freebsd      165    A
    

    Note the lack of the last “unused” section. This indicates that the FreeBSD slice can literally go to the very last block on the disk. GEOM is really looking great at this point!

    Sadly, this also means people will need to reinstall FreeBSD (specifically, deleting the slice and re-creating it) to benefit from this. As far as I know, you can’t fix this without a full reinstallation.

  • The EOF issue for ttys (re: ^D being shown) has been fixed and committed to CURRENT (FreeBSD 9.0), but hasn’t been MFC’d to RELENG_8 yet. Yes, it’s scheduled to be (in about 2 weeks). Big thanks to Ed Schouten for fixing this!
  • There were some ZFS commits which happened between RC1 and RC2 which may indicate that the ARC exhausting all available kmem is no longer possible. I have not been able to confirm/deny whether this fix works, but looking at the code, it may be sufficient. I’d need to get in touch with Kip Macy to confirm/deny.

Issues that are still pending:

  • bsdlabel still behaves incorrectly (“Class not found”). Instead, users should use gpart to write the bootstraps as follows: gpart bootcode [disk], where [disk] is ad4 or similar. Note that you pick the disk itself now, not the slice like in bsdlabel (unless you were using dangerously dedicated disks :-) ).
  • The ZFS notice pertaining to vfs.zfs.prefetch_disable when the system has less than 4GB RAM available has been re-worded again, but still is vague/unclear. A little bit of ego here — the person committing these changes should really consider changing the message to what I proposed.
  • I still haven’t received a reply to my request for clarification on ZFS stabilisation. Is /boot/loader.conf tuning for kmem-related parameters still required? We still need an official statement on this matter.

I also want to take a moment to send a shout-out to John Baldwin, who has been working incredibly hard on the FreeBSD kernel (specifically VM and ACPI) over the past 4 weeks. John, I’ve seen/followed your commits, and I appreciate the improvements! Thank you!

UNIX mail format annoyances

For many years now I’ve been dealing with an ongoing issue which still to this day has no real solution: classic UNIX mailboxes (called mbox) comparing the files’ mtime to its atime to determine if there’s new mail inside of the mailbox (if the mtime is greater than the atime, there’s new mail. If the mtime is smaller than the atime, new mail has been read/there is no new mail). “Classic mail spools” (e.g. /var/mail or /var/spool/mail) are mbox.

Why is this a problem? Because those of us who use mutt/alpine/etc. on our UNIX machines, who also do backups using things like tar/cp/rsync (more on rsync in a moment) end up with mailboxes with a lost/clobbered atime after the backup takes place. The end result: our mail clients no longer tell us there’s new mail in that mailbox, which can be detrimental in many respects.

The most common rebuttal is “shut up and use Maildir“. What Maildir advocates don’t care to acknowledge is that there are many problems with the Maildir concept, particularly when used on a filesystem like ZFS. With classic mbox, your multi-megabyte mailboxes loads quickly — but with Maildir, since it uses a single file per mail, the end result is a mail client that takes forever to load due to the one-file-per-mail concept. ZFS does not perform well when it comes to massive numbers of small/terse files.

UFS/UFS2, ext2fs/ext3fs, and other filesystems don’t have this problem, but let’s pull our heads out for a moment (since tunnel vision/ostrich syndrome is what got us here in the first place!) — we’re entering year 2010 and ZFS is already being used heavily by Solaris/OpenSolaris and FreeBSD users across the globe; ZFS is here to stay, end of discussion. There are some proposed solutions such as making use of ZFS’s semi-new L2ARC to add an additional layer of caching using dedicated low-latency devices (specifically SSDs), but there’s been no actual evidence this improves things with Maildir. And besides, who in their right mind is going to go out and drop hundreds of dollars on an Intel X25-M per machine just to solve this problem? Seriously.

And let’s not forget administrators who mount their filesystems with the noatime mount flag for added performance benefits, especially on a journalled filesystem.

One workaround proposed for mutt users involves recompiling mutt to use Oracle/SleepyCat DB, GDBM, or Tokyo Cabinet to maintain a cache of mailbox headers (using the header_cache directive), thus speeding up the process. Does this help? Yes, there’s a decent improvement, but anyone who uses this method (such as me) can tell you that it’s still no where near as fast as classic mbox, especially when you’ve got a mailbox with a couple hundred new mails in it.

Does the saga end here? Not even close.

There’s a new mailbox format, called MIX, which is being used within alpine. This format is more or less a combination of mbox and Maildir, and performs much better than Maildir. Sounds great, right? Except those of us who use mutt are out of luck — unsupported, and there’s been absolutely no discussion of it since February 2007. Even the author of mutt, Michael Elkins, had nothing useful to say other than snide comments. Oh, and MIX isn’t supported in procmail or Sieve either — double whammy. But MIX does sound like the way to go — too bad it isn’t getting the attention it should.

Some administrators using ZFS are using ZFS snapshots to do their backups instead of something like rsync, which is great except that they’re hit-or-miss (reliability-wise) on FreeBSD — or at least that’s what I last read 6-9 months ago — while rsync is filesystem-independent. Most folks I know who run into snapshot problems revert back to rsync.

So what now? With all the above in mind,I decided to poke at rsync, because there’s been many discussions in the past on the mailing lists about getting rsync to preserve file atime. rsync out-of-the-box will preserve ctime and mtime when using the --times flag. However, there’s a patch called atimes.diff which comes with the rsync-patches tarball that provides a --atimes flag that supposedly solves this. Sounds great… except there’s one problem…

The flag does cause the atimes of the source file to be copied to the destination, but the atimes of the source file are lost! And here’s a more recent confirmation.

If that’s not enough, here’s final confirmation. Note that I’m using non-zero-byte files intentionally; rsync behaves differently when the files are zero bytes.

rsync -a:

$ echo "hello" > source
$ stat -x source
Access: Wed Oct 28 06:27:05 2009
Modify: Wed Oct 28 06:27:05 2009
Change: Wed Oct 28 06:27:05 2009
$ rsync -a source dest
$ stat -x source
Access: Wed Oct 28 06:27:29 2009
Modify: Wed Oct 28 06:27:05 2009
Change: Wed Oct 28 06:27:05 2009
$ stat -x dest
Access: Wed Oct 28 06:27:29 2009
Modify: Wed Oct 28 06:27:05 2009
Change: Wed Oct 28 06:27:29 2009
$ rm source dest

Above, we see that after the rsync, the atime in the source file is lost, and the ctime in the destination file does not match that of the source — only the mtime is retained.

rsync -a --atimes:

$ echo "hello" > source
$ stat -x source
Access: Wed Oct 28 06:32:50 2009
Modify: Wed Oct 28 06:32:50 2009
Change: Wed Oct 28 06:32:50 2009
$ rsync -a --atimes source dest
$ stat -x source
Access: Wed Oct 28 06:34:06 2009
Modify: Wed Oct 28 06:32:50 2009
Change: Wed Oct 28 06:32:50 2009
$ stat -x dest
Access: Wed Oct 28 06:32:50 2009
Modify: Wed Oct 28 06:32:50 2009
Change: Wed Oct 28 06:34:06 2009

Above, we see the atime and the mtime in the source file is retained in the destination. However, again, the atime in the source file is lost and the ctime doesn’t match that of the source.

cp -p:

$ echo "hello" > source
$ stat -x source
Access: Wed Oct 28 06:37:56 2009
Modify: Wed Oct 28 06:37:56 2009
Change: Wed Oct 28 06:37:56 2009
$ cp -p source dest
$ stat -x source
Access: Wed Oct 28 06:38:27 2009
Modify: Wed Oct 28 06:37:56 2009
Change: Wed Oct 28 06:37:56 2009
$ stat -x dest
Access: Wed Oct 28 06:37:56 2009
Modify: Wed Oct 28 06:37:56 2009
Change: Wed Oct 28 06:38:27 2009

With cp -p, we see identical behaviour to that of rsync -a --atimes.

Some may be wondering: “is it even possible to solve this problem?” Of course it is. The logic flow should be pretty obvious at this point:

  1. stat(2) or fstat(3) the source file and save (in memory) the atime, mtime, and ctime. Neither call modifies the atime
  2. Copy the source file to the destination file
  3. Set the atime, mtime, and ctime of the destination file using utimes(3) with the previously-obtained values
  4. Set the atime and mtime of the source file using utimes(3) with the previously-obtained values

You can accomplish the same thing with touch.

And let’s not forget that FreeBSD lacks the O_NOATIME GNU extension for open(2), which was proposed in 1998.

So is there a solution to all of this? As far as I’ve been able to tell, no, there isn’t. Using filesystem-level snapshots appears to be the only way to “solve” this issue. I’d be much happier if the --atimes patch for rsync did what it was supposed to… but it’s 23KB, and I’m not familiar with the rsync code (it’s not as black-and-white as one may think).

We UNIX folks should be ashamed of this whole debacle. There isn’t a better way to say it: what a clusterfuck.

Testing out FreeBSD 8.0-RC1

EDIT: Those interested in the upcoming release of FreeBSD 8.0 should read both the below, as well as my Testing out FreeBSD 8.0-RC2 post (which notes that many, but not all, of these problems have been fixed).

Yesterday I took the plunge and upgraded my home FreeBSD amd64 box from RELENG_7 to FreeBSD 8.0-RC1, which going forward I will refer to as RELENG_8 (yes, it has been tagged!). I did a complete reinstall, like I always do when migrating between major FreeBSD releases. Said box consists of a Supermicro X7SBA motherboard, Intel Core2Duo E8400 CPU, 4GB of RAM, and 3 SATA disks connected via the on-board Intel ICH9 + AHCI — one for the OS, and two in a ZFS mirror pool. Relevant dmesg information:

CPU: Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz (2992.52-MHz K8-class CPU)
real memory  = 4294967296 (4096 MB)
avail memory = 4112478208 (3921 MB)
atapci0: <Intel ICH9 SATA300 controller> port 0x1c50-0x1c57,0x1c44-0x1c47,0x1c48-0x1c4f,0x1c40-0x1c43,0x18e0-0x18ff mem 0xdc000800-0xdc000fff irq 17 at device 31.2 on pci0
atapci0: [ITHREAD]
atapci0: AHCI called from vendor specific driver
atapci0: AHCI v1.20 controller with 6 3Gbps ports, PM supported
ata4: <ATA channel 2> on atapci0
ata4: [ITHREAD]
ata5: <ATA channel 3> on atapci0
ata5: [ITHREAD]
ata7: <ATA channel 5> on atapci0
ata7: [ITHREAD]
ad8: 190782MB <WDC WD2000JD-00HBB0 08.02D08> at ata4-master SATA150
ad10: 953869MB <WDC WD1001FALS-00J7B1 05.00K05> at ata5-master SATA300
ad14: 953869MB <WDC WD1001FALS-00J7B1 05.00K05> at ata7-master SATA300

And ZFS-related details:

  pool: storage
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            ad10    ONLINE       0     0     0
            ad14    ONLINE       0     0     0

errors: No known data errors

The first thing worth noting is that I performed the installation entirely using the FreeBSD 8.0-RC1 amd64 memstick image on a SanDisk Cruzer Micro 2GB USB drive. If you’re going to try this, make sure you have a 2GB or larger USB drive, since the memstick image is larger than 1GB.

Writing the .img file to the USB drive required the use of another FreeBSD box (someone is going to have to address that fact eventually), and was achieved per Ken Smith’s original instructions:

dd if=8.0-RC1-amd64-memstick.img of=/dev/da0 bs=10240 conv=sync

The entire dd took about 4 minutes. I then booted it directly without any issues. Installation went as expected, with the exception of choosing a different installation medium than usual; there’s a new USB menu item near the bottom of the installation medium selection list. I should also note that I deleted the existing FreeBSD partition and re-created it during the sysinstall phase, and during the Boot Manager selection phase, I chose Standard like I always do. You’ll understand why I’ve noted these two things in a moment.

After rebooting + booting off the main OS hard disk, the first thing I saw which was different/anomalous was that I was being shown the F1/F5/F6 FreeBSD boot manager menu — as if I had selected BootMgr and not Standard. Options shown were F1 for FreeBSD, F5 for Floppy, and F6 for PXE (that’s a new one!). So there’s definitely a bug/regression somewhere with regards to the boot manager you choose; or maybe it was because I was installing from a USB drive? Not sure.

The FreeBSD box booted fine, but the following kernel message caught my eye:

GEOM: ad8s1: geometry does not match label (255h,63s != 16h,63s).

This indicated that what sysinstall wrote to the actual on-disk BSD label, as far as drive geometry went, didn’t match what GEOM expected — GEOM expecting 16 heads, the drive label containing 255 heads. I’m not using GPT (at least not knowingly; if sysinstall does it without telling you, then someone needs to work out where the actual problem lies). I first looked at gpart show ad8 and gpart show ad8s1 to see what it claimed:

# gpart show ad8
=>       63  390721905  ad8  MBR  (186G)
         63  390716802    1  freebsd  [active]  (186G)
  390716865       5103       - free -  (2.5M)

# gpart show ad8s1
=>        0  390716802  ad8s1  BSD  (186G)
          0    4194304      1  freebsd-ufs  (2.0G)
    4194304   16777216      2  freebsd-swap  (8.0G)
   20971520   33554432      4  freebsd-ufs  (16G)
   54525952   16777216      5  freebsd-ufs  (8.0G)
   71303168  319413634      6  freebsd-ufs  (152G)

Next, and more conclusive, I used bsdlabel -e -A ad8s1 and this is what I got:

# /dev/ad8s1:
type: ESDI
disk: ad8s1
label:
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 24321
sectors/unit: 390721968
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0           # milliseconds
track-to-track seek: 0  # milliseconds
drivedata: 0

8 partitions:
#        size   offset    fstype   [fsize bsize bps/cpg]
  a:  4194304        0    4.2BSD        0     0     0
  b: 16777216  4194304      swap
  c: 390716802        0    unused        0     0         # "raw" part, don't edit
  d: 33554432 20971520    4.2BSD        0     0     0
  e: 16777216 54525952    4.2BSD        0     0     0
  f: 319413634 71303168    4.2BSD        0     0     0

So yes, sysinstall is definitely doing the Wrong Thing(tm) here. One of the biggest problems with this is that users cannot easily fix this — it requires booting a fixit or LiveFS image and editing the disk label, as well as users having to re-calculate sectors/cylinder and cylinders by hand. And no, I’m not talking out of my ass. Also note that the post to freebsd-current is dated March 2009 — 7 months ago. So this issue has existed for quite some time without proper attention.

Upon exiting the editor session inside of bsdlabel (using :q!), I was given the following error:

bsdlabel: partition c doesn't cover the whole unit!
bsdlabel: An incorrect partition c may cause problems for standard system utilities
bsdlabel: Class not found
re-edit the label? [y]: n

Whoa whoa whoa, even more insanity going on here! Also, what’s with the Class not found error? Wow, this is pretty jacked, and apparently I’m not the only one who’s noticed. Note this post is dated July 2009 — 3 months ago. The easy solution seems to be to use gpart(8) to create all of the slices… except it’s obvious no one has fixed sysinstall to do this. There’s also a patch mentioned which fixes the problem, but that obviously wasn’t committed before the RELENG_8 tagging, nor backported since.

But the root of the problem does appear to be sysinstall not doing the Right Thing(tm) any longer.

Next up was my attempt to fix the Boot Manager oddities. Historically, re-writing the boot blocks on a disk consisted of doing the appropriate equivalent of bsdlabel -B ad8s1. However, I was greeted with the exact same “Class not found” error as above. Hmmm… This doesn’t bode well. Presumably I can use gpart(8) to re-write the boot blocks, but given that GEOM is complaining about disk geometry errors, I don’t dare mess with it.

All of this is a pretty major bug. The kernel message is going to catch a lot of user attention if it’s not fixed by the time 8.0-RELEASE is announced. I plan on sending Robert Watson and Ken Smith an Email about the issue after I get done writing this.

At this point I decided to make appropriate modifications to /etc/rc.conf, specifically the addition of zfs_enable="yes", so that I could get access to my ZFS filesystems. After doing so, and running /etc/rc.d/hostid start then /etc/rc.d/zfs start, I was greeted with the usual ZFS kernel messages — but with an unexpected surprise:

ZFS NOTICE: system has less than 4GB and prefetch enable is not set... disabling.

Given how familiar I am with FreeBSD and ZFS at this point, this message caught me off-guard.

First of all, grammatically this sentence is confusing as hell — there is no “prefetch enable” tunable; the tunable is actually called vfs.zfs.prefetch_disable, and it defaults to 0 (off, e.g. prefetching enabled), so why would I exclusively enable something which is enabled by default? And why’s it getting disabled? Secondly, my system has 4GB of RAM installed… so what’s going on here?!

I dug around in the relevant CVS commit logs and found numerous changes to this file, specifically the message that was printed. Apparently the existing message was reviewed by 5 separate people (revision 1.23) to “Improve wording”.

   3554 #ifdef _KERNEL
   3555         if (TUNABLE_INT_FETCH("vfs.zfs.prefetch_disable", &zfs_prefetch_disable))
   3556                 prefetch_tunable_set = 1;
   3557
   ...
   3565         if ((((uint64_t)physmem * PAGESIZE) < (1ULL << 32)) &&
   3566             prefetch_tunable_set == 0) {
   3567                 printf("ZFS NOTICE: system has less than 4GB and prefetch enable is not set"
   3568                     "... disabling.\n");
   3569                 zfs_prefetch_disable=1;
   3570         }
   3571 #endif

Ahh, now we have a much better idea of what’s going on. There are two reasons why this message got printed on my machine:

1) I had not done any tuning of /boot/loader.conf at this point, so vfs.zfs.prefetch_disable hadn’t been set. The above code basically says “if someone has administratively set vfs.zfs.prefetch_disable to something in loader.conf, set prefetch_tunable_set to 1″. You can set the tunable to whatever you want (enabled or disabled) and the message won’t get printed. If you don’t set the tunable, the following applies:

2) physmem is actually the amount of memory in pages that’s available to the kernel when it loads. The multiplication is actually hw.availpages * hw.pagesize. The 1ULL << 32 statement may look ugly but it’s a bitshift equivalent of 2^32, e.g. 4294967296.

Let’s work out the math:

# sysctl hw.pagesize hw.availpages
hw.pagesize: 4096
hw.availpages: 1046201
# expr 1046201 "*" 4096
4285239296

4285239296 is indeed less than 4294967296. Wait a minute… where’s that extra memory going?

Well, it’s going to two places on the X7SBA: 1) on-board video (which has an 8MB framebuffer), and 2) the AHCI BIOS which takes up an unknown amount of RAM, but I’d guess about 1-2MB. So let’s do the math:

# expr 4294967296 - 4285239296
9728000

With all of this information kept in mind, the kernel message really should be re-worded to say the following:

ZFS NOTICE: System has less than 4294967296 bytes (4GB) of usable memory,
ZFS NOTICE: and vfs.zfs.prefetch_disable has not explicitly been defined
ZFS NOTICE: in loader.conf.  Setting vfs.zfs.prefetch_disable="1"...

I’m also questioning the logic behind why prefetching is disabled on systems with less than 4GB of available memory; I’d like to know what the reasoning is there. Is it in regards to stability? Performance? I don’t know. I can’t find an answer on the mailing lists either.

Finally, I found an unexpected oddity with the new tty/pty/pts code with regards to EOF. All other operating systems, including RELENG_7 and earlier, behave as follows when EOF is pressed on a terminal. This is regardless of shell, by the way:

bash$ cat
{press Control-D here}bash$

While on RELENG_8, the literal Control-D (^D) character is shown on-screen:

bash$ cat
{press Control-D here}^Dbash$

I’ve already mailed Ed Schouten about this, and he agrees it’s a bug which he’ll work on fixing, hopefully tonight.

Everything else past this point was peachy keen. No odd problems building ports, no system lock-ups or odd experiences, and so on. It’s all worked great so far. I’m looking forward to upgrading our production servers to RELENG_8 when it comes out.

Posted in FreeBSD, ZFS. 1 Comment »

FreeBSD and ZFS — is it truly stable?

There’s an “age old question” that has been floating around with regards to ZFS on FreeBSD — is it stable? “Stable” in this case means: do I risk losing my data, will it cause kernel panics or other oddities, and do I need to tune it?

The answer, still, may be yes.

I’ve taken the initiative — that is to say, get an official response to these type of questions, specifically with regards to kernel panics. I’m incredibly surprised no one — not even the user community — has responded at this point. It’s not a trick question either; FreeBSD users really do need an answer to this.

People are continually comparing FreeBSD’s ZFS to that of Solaris 10 and OpenSolaris’ ZFS. Given that my day job involves heavy use of Solaris 10 on massive numbers of servers across the United States, I can safely say without a doubt ZFS on Solaris behaves better and won’t crash your system due to kernel memory exhaustion.

Posted in FreeBSD, ZFS. 1 Comment »

FreeBSD and ZFS — NFS bug fixed on RELENG_7 amd64

I just saw this commit come through for RELENG_7:

 Edit src/sys/nfsserver/nfs_serv.c
  Add delta 1.174.2.8 2009.07.01.12.44.23 avg

The CVS commit log indicates this fixes a bug where NFS is being used on ZFS v13 exported filesystems, and the system mounting the NFS share attempts to open(2) with flags O_CREAT and O_EXCL set. The file is created — 0 bytes in size, with mode 0000 — yet the operation returns EIO. This is pretty major:

src/sys/nfsserver/nfs_serv.c

I also enjoyed reading the PR for this bug, where some developers made some amazing statements (my favourite being “use UFS2 instead of ZFS, use cp/rm instead of mv, don’t use NFS”):

http://www.freebsd.org/cgi/query-pr.cgi?pr=135412

This further validates my concern that there isn’t enough QA being done prior to code being committed to the STABLE branches. I guess no one tested ZFS v13 filesystems being exported via NFS prior to the v13 commit?

Believe me, I’m thankful that ZFS v13 is now part of STABLE — sincerely I am — but my concern isn’t limited to ZFS: it applies to FreeBSD as a whole.

ports/editors/vim — unacceptable patch methodology

UPDATE: It seems the port maintainer decided that he should rename the patch file to 7.2.041^. Yes, that’s right, a caret on the end. This addresses the problem with URI escaping for people who pull patches locally via HTTP, but why he chose this naming convention is beyond me. Given the Makefile’s non-complexity, is it that hard to call the patch 7.2.041-freebsd-ports? Or better yet, why not commit it to CVS — files/ does exist for a reason…

This also applies to ports/editors/vim-lite, obviously.

It seems a recent change to the vim port has introduced a “custom” patch file, called 7.2.041% (note the percentage symbol at the end). Another user has opened up a PR on this matter, but the responsible individual has not responded in a month:

http://www.freebsd.org/cgi/query-pr.cgi?pr=ports/135689

Rather than accept silence, I discussed the matter with flz@freebsd.org directly, who initially committed a fix I recommended (removing the PATCHFILES line). This fix was quickly backed out — because apparently the 7.2.041% patch does exist, just not on any of the official vim mirrors.

So where is this patch, and why can’t it be fetched? Well, it can be… you just have to wait 5 full minutes to go through all the mirrors, many of which time out or exhibit odd behaviour (note the Israeli server which results in “protocol error”):

# cd /usr/ports/editors/vim-lite
# time make fetch
=> 7.2.041% doesn't seem to exist in /usr/ports/distfiles/vim.
=> Attempting to fetch from http://ftp.vim.org/pub/vim/patches/7.2/.
fetch: http://ftp.vim.org/pub/vim/patches/7.2/7.2.041%: Bad Request
=> Attempting to fetch from http://ftp.vim.org/pub/vim/patches/7.2/.
fetch: http://ftp.vim.org/pub/vim/patches/7.2/7.2.041%: Bad Request
=> Attempting to fetch from http://mirrors.24-7-solutions.net/pub/vim/patches/7.2/.
fetch: http://mirrors.24-7-solutions.net/pub/vim/patches/7.2/7.2.041%: Bad Request
=> Attempting to fetch from http://ftp.tw.vim.org/pub/vim/patches/7.2/.
fetch: http://ftp.tw.vim.org/pub/vim/patches/7.2/7.2.041%: Bad Request
=> Attempting to fetch from http://vim.stu.edu.tw/patches/7.2/.
fetch: http://vim.stu.edu.tw/patches/7.2/7.2.041%: Not Found
=> Attempting to fetch from http://gd.tuwien.ac.at/pub/vim/patches/7.2/.
fetch: http://gd.tuwien.ac.at/pub/vim/patches/7.2/7.2.041%: Bad Request
=> Attempting to fetch from http://www.etsimo.uniovi.es/pub/vim/patches/7.2/.
fetch: http://www.etsimo.uniovi.es/pub/vim/patches/7.2/7.2.041%: Bad Request
=> Attempting to fetch from http://www.pt.vim.org/pub/vim/patches/7.2/.
fetch: http://www.pt.vim.org/pub/vim/patches/7.2/7.2.041%: Bad Request
=> Attempting to fetch from http://www.pangora.org/vim.org/pub/vim/patches/7.2/.
fetch: http://www.pangora.org/vim.org/pub/vim/patches/7.2/7.2.041%: Operation timed out
=> Attempting to fetch from http://www.math.technion.ac.il/pub/vim/patches/7.2/.
fetch: http://www.math.technion.ac.il/pub/vim/patches/7.2/7.2.041%: Bad Request
=> Attempting to fetch from http://vim.fyxm.net/pub/vim/patches/7.2/.
fetch: http://vim.fyxm.net/pub/vim/patches/7.2/7.2.041%: Bad Request
=> Attempting to fetch from http://zloba.ath.cx/pub/vim/patches/7.2/.
fetch: http://zloba.ath.cx/pub/vim/patches/7.2/7.2.041%: Bad Request
=> Attempting to fetch from http://ftp2.uk.vim.org/sites/ftp.vim.org/pub/vim/patches/7.2/.
fetch: http://ftp2.uk.vim.org/sites/ftp.vim.org/pub/vim/patches/7.2/7.2.041%: Bad Request
=> Attempting to fetch from http://vim.mirror.fr/patches/7.2/.
fetch: http://vim.mirror.fr/patches/7.2/7.2.041%: Operation timed out
=> Attempting to fetch from ftp://ftp.vim.org/pub/vim/patches/7.2/.
fetch: ftp://ftp.vim.org/pub/vim/patches/7.2/7.2.041%: File unavailable (e.g., file not found, no access)
=> Attempting to fetch from ftp://ftp2.us.vim.org/pub/vim/patches/7.2/.
fetch: ftp://ftp2.us.vim.org/pub/vim/patches/7.2/7.2.041%: Operation timed out
=> Attempting to fetch from ftp://ftp9.us.vim.org/pub/vim/patches/7.2/.
fetch: ftp://ftp9.us.vim.org/pub/vim/patches/7.2/7.2.041%: File unavailable (e.g., file not found, no access)
=> Attempting to fetch from ftp://ftp.ca.vim.org/pub/vim/patches/7.2/.
fetch: ftp://ftp.ca.vim.org/pub/vim/patches/7.2/7.2.041%: File unavailable (e.g., file not found, no access)
=> Attempting to fetch from ftp://ftp.nl.vim.org/pub/vim/patches/7.2/.
fetch: ftp://ftp.nl.vim.org/pub/vim/patches/7.2/7.2.041%: File unavailable (e.g., file not found, no access)
=> Attempting to fetch from ftp://ftp.de.vim.org/patches/7.2/.
fetch: ftp://ftp.de.vim.org/patches/7.2/7.2.041%: File unavailable (e.g., file not found, no access)
=> Attempting to fetch from ftp://ftp3.de.vim.org/pub/vim/patches/7.2/.
fetch: ftp://ftp3.de.vim.org/pub/vim/patches/7.2/7.2.041%: File unavailable (e.g., file not found, no access)
=> Attempting to fetch from ftp://ftp.uk.vim.org/pub/vim/patches/7.2/.
fetch: ftp://ftp.uk.vim.org/pub/vim/patches/7.2/7.2.041%: File unavailable (e.g., file not found, no access)
=> Attempting to fetch from ftp://ftp.ie.vim.org/pub/vim/patches/7.2/.
fetch: ftp://ftp.ie.vim.org/pub/vim/patches/7.2/7.2.041%: File unavailable (e.g., file not found, no access)
=> Attempting to fetch from ftp://ftp.at.vim.org/pub/vim/patches/7.2/.
fetch: ftp://ftp.at.vim.org/pub/vim/patches/7.2/7.2.041%: File unavailable (e.g., file not found, no access)
=> Attempting to fetch from ftp://ftp.pt.vim.org/pub/vim/patches/7.2/.
fetch: ftp://ftp.pt.vim.org/pub/vim/patches/7.2/7.2.041%: File unavailable (e.g., file not found, no access)
=> Attempting to fetch from ftp://ftp.is.vim.org/pub/vim/patches/7.2/.
fetch: ftp://ftp.is.vim.org/pub/vim/patches/7.2/7.2.041%: File unavailable (e.g., file not found, no access)
=> Attempting to fetch from ftp://ftp.il.vim.org/pub/vim/patches/7.2/.
fetch: ftp://ftp.il.vim.org/pub/vim/patches/7.2/7.2.041%: Protocol error
=> Attempting to fetch from ftp://ftp.pl.vim.org/pub/vim/patches/7.2/.
fetch: ftp://ftp.pl.vim.org/pub/vim/patches/7.2/7.2.041%: File unavailable (e.g., file not found, no access)
=> Attempting to fetch from ftp://ftp.ro.vim.org/pub/vim/patches/7.2/.
fetch: ftp://ftp.ro.vim.org/pub/vim/patches/7.2/7.2.041%: File unavailable (e.g., file not found, no access)
=> Attempting to fetch from ftp://ftp.sk.vim.org/pub/vim/patches/7.2/.
fetch: ftp://ftp.sk.vim.org/pub/vim/patches/7.2/7.2.041%: File unavailable (e.g., file not found, no access)
=> Attempting to fetch from ftp://ftp.tw.vim.org/pub/vim/patches/7.2/.
fetch: ftp://ftp.tw.vim.org/pub/vim/patches/7.2/7.2.041%: File unavailable (e.g., file not found, no access)
=> Attempting to fetch from ftp://vim.stu.edu.tw/pub/vim/patches/7.2/.
fetch: ftp://vim.stu.edu.tw/pub/vim/patches/7.2/7.2.041%: File unavailable (e.g., file not found, no access)
=> Attempting to fetch from ftp://ftp.jp.vim.org/pub/vim/patches/7.2/.
fetch: ftp://ftp.jp.vim.org/pub/vim/patches/7.2/7.2.041%: File unavailable (e.g., file not found, no access)
=> Attempting to fetch from ftp://ftp.kr.vim.org/pub/vim/patches/7.2/.
fetch: ftp://ftp.kr.vim.org/pub/vim/patches/7.2/7.2.041%: File unavailable (e.g., file not found, no access)
=> Attempting to fetch from ftp://ftp.mirrorservice.org/sites/ftp.vim.org/pub/vim/patches/7.2/.
fetch: ftp://ftp.mirrorservice.org/sites/ftp.vim.org/pub/vim/patches/7.2/7.2.041%: File unavailable (e.g., file not found, no access)
=> Attempting to fetch from ftp://ftp.FreeBSD.org/pub/FreeBSD/ports/local-distfiles/obrien/.
7.2.041%                                      100% of   21 kB   34 kBps
0.256u 0.506s 4:58.00 0.2%      129+1071k 0+0io 0pf+0w

Apparently the 7.2.041% patch is a FreeBSD-specific version of the official 7.2.041 patch which is on the mirrors. But what I’d like to know why this custom FreeBSD-specific 7.2.041 patch was not committed to CVS (e.g. files/7.2.041%). Why must it reside in the maintainer’s public_distfiles? This is silly.

Likewise, I think the existing methodology of vim updates is retarded; for example, there were over 300 patches for 7.0 before 7.1 was released. This isn’t the fault of FreeBSD, but rather Bram Moolenaar.

FreeBSD and ZFS — more performance quirks

I’ve been trying to work out the source of “strange” performance-related issues when using ZFS on FreeBSD. The situation is reproducible, but not always, and it seems the complex inner nature of ZFS is probably contributing to the problem.

The problem I’m trying to track down is what causes sudden fread() operations, with 4KB reads, to suddenly start taking an absurd amount of time (0.2 seconds, for example), on a ZFS raidz1 pool with 4 disks all capable of operating at 80-100MB/sec (read and write).

My understanding is that the slowness is expected initially as ZFS’s ZIL and ARC do not have any reference to this data the first time around. And that’s fine — not to mention, seems true. For example, today I performed the following test:

1) rm Maildir/header_cache.db
2) mutt -y
3) Ran through all of the folders, thus populating the Maildir header cache. A folder with 400 files in it would take maybe 10 full seconds. Then I exited mutt.
4) Again, rm header_cache.db
5) mutt -y
6) Once more, ran through all the folders. This time things were much faster — almost instantaneous. So populating the cache was very fast/quick given that ZFS now had some of the files/whatever cached in the ARC, or possibly references to the system calls in the ZIL.

This is an ideal situation, and makes sense. The ARC has a tendency to grow very large as more and more I/O happens, and that’s fine — that’s the nature of the ZFS beast.

However, where things get bizarre is when the above situation “logical” scenario stops occurring — acting as if the ZFS ARC or ZIL is “full” and doesn’t choose to cache anything more for some particular reason. Again, I can reproduce this quite easily using the above procedure, but it’s not a very good real-world test to post to a mailing list. It’s getting to the point where I’m probably going to have to write some C code that mimics all of the scenarios possible. There is definitely a problem somewhere, and I think that’s the only way I’m going to get people to track it down.

But something today occurred to me while reading the Cache Flushes section of the ZFS Evil Tuning Guide. I was left wondering if setting the FreeBSD equivalent of zfs_nocacheflush would improve things.

So today I decided to set vfs.zfs.cache_flush_disable=1 in /boot/loader.conf to see how things performed.

Something tells me I’m probably going to have to create a ZFS-related WordPress Category in my blog just for this kind of thing. :-)

ZFS support in loader(8) being continually added/removed

FreeBSD users should be aware of the massive rash of commits which have occurred over the past few weeks with regards to LOADER_ZFS_SUPPORT functionality. This functionality has been added, removed, tinkered with, re-added, removed, etc. numerous times. Proof is provided below. As of this writing, LOADER_ZFS_SUPPORT has been disabled entirely. Please see these commits:

This affects both i386 and amd64, despite the pathname implying otherwise.

FreeBSD users should be outraged by this, and be questioning why said changes are not being fully tested before being committed. I’ll use this opportunity as confirmation of further proof that all administrators should be paying VERY close attention to commits to src-all in RELEASE and STABLE branches.

I consider this evidence further justification for keeping one’s root filesystem as UFS.

FreeBSD and ZFS — horrible raidz1 speed — finale

A follow-up to the following two posts of mine:

The problem I described has not recurred since enabling prefetch. So it seems whatever performance-related problems we had with prefetch when ZFS was first committed to FreeBSD “back in the day” have since been addressed. I wish I could pinpoint where/when/how this was fixed, but the beast is complex…

I’ve since re-enabled prefetch on my co-located production servers (Intel ICH7-based) and I’m seeing great improvements there too. Those are single-disk systems (e.g. no raidz1 in use) too.

I’d recommend that users who have previously disabled the ZFS prefetch mechanism on FreeBSD should re-enable it and reboot. :-)

Posted in FreeBSD, ZFS. 1 Comment »