getmail 4.15.0 stuck in an infinite mmap loop

Tonight right before leaving for work, I caught my FreeBSD box on my LAN behaving oddly (slow response time). I’d provide top output but it’s not in my scroll-back; each process below was taking up ~100% CPU. ps -auxwwww showed the following:

jdc     21282 20.4 24.4 3206516 1021080  ??  Rs   10:31pm   1:02.78 /usr/local/bin/python2.6 /usr/local/bin/getmail --quiet
jdc     21264 14.4 22.2 3085684  930448  ??  Rs   10:30pm   1:00.51 /usr/local/bin/python2.6 /usr/local/bin/getmail --quiet
jdc     21243 11.8 33.2 4277620 1388616  ??  RLs  10:29pm   1:14.18 /usr/local/bin/python2.6 /usr/local/bin/getmail --quiet

Numerous things to note here:

  • There’s multiple getmail clients running at once. I use getmail via a cron job, so that explains the 1 minute differential between process start times — but what doesn’t make any sense is why getmail is permitting multiples being run at once. (I only poll from a single IMAP mailbox)
  • The RSS size of the processes is through the roof. This would grow quickly, then decrease slightly, then grow quickly again, etc.
  • When I truss‘d the 21264 process to find out what the hell it was doing, all I saw were constant, non-stop mmap(2) calls, with an offset that was continually increasing.
  • Memory on the machine had not been exhausted, but obviously had I not caught this, all sorts of insanity would have happened (processes all over the dying off as a result of ENOMEM)

I ended up killing the processes off, which restored everything to its previous state — or, well, mostly. I had a couple processes with a PPID of 1 (init), indicating they were zombies waiting to be reaped. I also had a bunch of unrelated processes with 0KB RSS, indicating the entire process had been swapped to disk to provide more memory for the broken getmail processes.

There’s absolutely nothing in my getmail log indicating any kind of justification for this behaviour. This is what I have:

2009-12-31 22:29:00 Initializing SimpleIMAPSSLRetriever:user@mx01.sc1.parodius.com:993:
2009-12-31 22:30:00 Initializing SimpleIMAPSSLRetriever:user@mx01.sc1.parodius.com:993:
2009-12-31 22:31:00 Initializing SimpleIMAPSSLRetriever:user@mx01.sc1.parodius.com:993:
2009-12-31 22:32:00 Initializing SimpleIMAPSSLRetriever:user@mx01.sc1.parodius.com:993:
2009-12-31 22:32:40   0 messages (0 bytes) retrieved, 0 skipped

This lead me to look at network connectivity to the IMAP server (which I also own/maintain). I do periodic traceroutes/mtrs to certain locations, one of which is where the IMAP server resides. I found that starting at 22:27 and ending at around 22:32, there was a networking problem with my co-lo provider resulting in some packet loss when reaching my servers:

=== Thu Dec 31 22:26:00 PST 2009  (1262327160)
HOST: icarus.home.lan             Loss%   Snt   Rcv  Last   Avg  Best  Wrst
  1. 192.168.1.1                   0.0%    45    45   0.4   0.4   0.3   0.4
  2. ???                          100.0    45     0   0.0   0.0   0.0   0.0
  3. 68.85.191.253                 0.0%    45    45   8.8  11.8   6.5  79.3
  4. 68.85.155.78                  0.0%    45    45   9.0  11.5   8.6  15.2
  5. 68.86.90.137                  0.0%    45    45  11.6  13.4  10.9  17.5
  6. 68.86.86.202                  0.0%    45    45  12.7  13.6  11.8  17.5
  7. 4.71.118.9                    0.0%    45    45  13.2  16.1  11.4  64.0
  8. 4.68.18.195                   0.0%    45    45  12.4  23.6  11.7 178.2
  9. 4.79.219.106                  0.0%    45    45  12.9  13.5  12.3  17.5
 10. 209.128.95.111                0.0%    45    45  13.5  14.8  12.3  29.7
 11. 209.128.109.253               0.0%    45    45  12.9  14.2  12.3  25.2
 12. 72.20.106.125                 0.0%    45    45  13.4  13.8  12.3  18.2
=== END

=== Thu Dec 31 22:27:00 PST 2009  (1262327220)
HOST: icarus.home.lan             Loss%   Snt   Rcv  Last   Avg  Best  Wrst
  1. 192.168.1.1                   0.0%    45    45   0.4   0.4   0.3   0.6
  2. ???                          100.0    45     0   0.0   0.0   0.0   0.0
  3. 68.85.191.253                 0.0%    45    45   6.8   9.3   6.6  58.0
  4. 68.85.155.78                  0.0%    45    45  10.3  10.1   8.6  11.9
  5. 68.86.90.137                  0.0%    45    45  13.0  12.5  10.8  18.5
  6. 68.86.86.202                  0.0%    45    45  11.7  13.6  11.6  27.3
  7. 4.71.118.9                    0.0%    45    45  13.2  16.3  11.4  98.6
  8. 4.68.18.195                   0.0%    45    45  17.4  26.6  11.6 177.8
  9. 4.79.219.106                  0.0%    45    45  16.6  13.8  11.8  19.2
 10. 209.128.95.111               20.0%    45    36  19.9  17.9  16.2  19.9
 11. 209.128.109.253              11.1%    45    40  22.2  22.4  21.1  27.8
 12. 72.20.106.125                26.7%    45    33  30.2  27.0  25.0  32.5
=== END

=== Thu Dec 31 22:28:00 PST 2009  (1262327280)
HOST: icarus.home.lan             Loss%   Snt   Rcv  Last   Avg  Best  Wrst
  1. 192.168.1.1                   0.0%    45    45   0.4   0.4   0.3   0.5
  2. ???                          100.0    45     0   0.0   0.0   0.0   0.0
  3. 68.85.191.253                 0.0%    45    45  10.9  12.4   6.7 144.7
  4. 68.85.155.78                  0.0%    45    45  11.1  11.0   9.0  13.9
  5. 68.86.90.137                  0.0%    45    45  12.0  12.7  10.8  22.0
  6. 68.86.86.202                  0.0%    45    45  14.3  12.9  11.5  14.9
  7. 4.71.118.9                    0.0%    45    45  13.0  19.0  11.4  86.2
  8. 4.68.18.195                   0.0%    45    45  12.6  26.4  11.7 237.5
  9. 4.79.219.106                  0.0%    45    45  14.0  13.5  11.8  21.5
 10. 209.128.95.111               22.2%    45    35  17.6  19.0  16.4  33.9
 11. 209.128.109.253              11.1%    45    40  21.3  22.3  20.7  27.6
 12. 72.20.106.125                40.0%    45    27  26.1  26.5  24.9  28.1
=== END

=== Thu Dec 31 22:29:00 PST 2009  (1262327340)
HOST: icarus.home.lan             Loss%   Snt   Rcv  Last   Avg  Best  Wrst
  1. 192.168.1.1                   0.0%    45    45   0.3   0.4   0.3   0.4
  2. ???                          100.0    45     0   0.0   0.0   0.0   0.0
  3. 68.85.191.253                 0.0%    45    45   9.6   9.7   6.6  29.7
  4. 68.85.155.78                  0.0%    45    45  11.3  11.1   8.7  23.8
  5. 68.86.90.137                  0.0%    45    45  12.2  12.6  11.0  16.2
  6. 68.86.86.202                  0.0%    45    45  13.0  13.6  11.5  22.0
  7. 4.71.118.9                    0.0%    45    45  13.3  15.7  11.4  62.2
  8. 4.68.18.195                   0.0%    45    45  12.5  28.5  11.9 194.8
  9. 4.79.219.106                  0.0%    45    45  14.0  14.6  11.8  20.5
 10. 209.128.95.111               17.8%    45    37  17.7  19.0  16.7  26.8
 11. 209.128.109.253               6.7%    45    42  22.5  23.2  20.6  30.1
 12. 72.20.106.125                40.0%    45    27  26.2  26.7  24.8  33.5
=== END

=== Thu Dec 31 22:30:00 PST 2009  (1262327400)
HOST: icarus.home.lan             Loss%   Snt   Rcv  Last   Avg  Best  Wrst
  1. 192.168.1.1                   0.0%    45    45   0.5   0.4   0.3   0.6
  2. ???                          100.0    45     0   0.0   0.0   0.0   0.0
  3. 68.85.191.253                 0.0%    45    45  10.0   8.7   6.6  12.9
  4. 68.85.155.78                  0.0%    45    45  11.9  10.9   8.7  18.5
  5. 68.86.90.137                  0.0%    45    45  11.5  12.6  10.8  19.3
  6. 68.86.86.202                  0.0%    45    45  13.4  13.4  11.7  17.8
  7. 4.71.118.9                    0.0%    45    45  12.4  18.0  11.5 102.2
  8. 4.68.18.195                   0.0%    45    45  14.0  27.0  11.8 185.0
  9. 4.79.219.106                  0.0%    45    45  13.0  13.3  11.7  16.8
 10. 209.128.95.111               24.4%    45    34  18.7  18.0  16.5  32.9
 11. 209.128.109.253              17.8%    45    37  21.2  22.9  20.4  49.7
 12. 72.20.106.125                44.4%    45    25  26.7  26.3  25.1  27.4
=== END

=== Thu Dec 31 22:31:00 PST 2009  (1262327460)
HOST: icarus.home.lan             Loss%   Snt   Rcv  Last   Avg  Best  Wrst
  1. 192.168.1.1                   0.0%    45    45   0.4   0.4   0.3   0.6
  2. ???                          100.0    45     0   0.0   0.0   0.0   0.0
  3. 68.85.191.253                 0.0%    45    45   7.7   8.0   6.6   9.9
  4. 68.85.155.78                  0.0%    45    45   9.5  10.8   8.7  19.2
  5. 68.86.90.137                  0.0%    45    45  12.1  12.7  10.6  23.8
  6. 68.86.86.202                  0.0%    45    45  11.9  15.1  11.6  20.5
  7. 4.71.118.9                    0.0%    45    45  13.4  16.0  11.8  47.6
  8. 4.68.18.195                   0.0%    45    45  13.8  27.4  11.8 196.7
  9. 4.79.219.106                  0.0%    45    45  12.9  13.6  11.6  16.6
 10. 209.128.95.111               51.1%    45    22  19.0  17.9  16.3  19.9
 11. 209.128.109.253              46.7%    45    24  22.1  22.4  20.8  26.0
 12. 72.20.106.125                44.4%    45    25  27.1  26.6  24.5  29.5
=== END

=== Thu Dec 31 22:32:00 PST 2009  (1262327520)
HOST: icarus.home.lan             Loss%   Snt   Rcv  Last   Avg  Best  Wrst
  1. 192.168.1.1                   0.0%    45    45   0.4   0.4   0.3   0.5
  2. ???                          100.0    45     0   0.0   0.0   0.0   0.0
  3. 68.85.191.253                 0.0%    45    45   9.7  12.5   6.6 192.5
  4. 68.85.155.78                  0.0%    45    45  11.0  10.6   8.6  19.0
  5. 68.86.90.137                  0.0%    45    45  12.1  12.8  10.6  26.4
  6. 68.86.86.202                  0.0%    45    45  11.9  13.1  11.6  22.5
  7. 4.71.118.9                    0.0%    45    45  13.3  15.8  11.4  65.6
  8. 4.68.18.195                   0.0%    45    45  12.6  27.8  12.0 179.5
  9. 4.79.219.106                  0.0%    45    45  13.9  13.3  11.7  17.2
 10. 209.128.95.111               40.0%    45    27  20.0  17.7  16.4  20.0
 11. 209.128.121.125              15.6%    45    38  27.6  23.9  21.0  28.3
 12. 72.20.106.125                46.7%    45    24  30.2  27.6  25.3  30.2
=== END

=== Thu Dec 31 22:33:00 PST 2009  (1262327580)
HOST: icarus.home.lan             Loss%   Snt   Rcv  Last   Avg  Best  Wrst
  1. 192.168.1.1                   0.0%    45    45   0.4   0.4   0.3   0.5
  2. ???                          100.0    45     0   0.0   0.0   0.0   0.0
  3. 68.85.191.253                 0.0%    45    45   9.9   9.5   6.6  13.9
  4. 68.85.155.78                  0.0%    45    45   9.8  11.1   8.8  17.9
  5. 68.86.90.137                  0.0%    45    45  12.6  13.2  11.1  21.6
  6. 68.86.86.202                  0.0%    45    45  12.4  13.4  11.4  23.5
  7. 4.71.118.9                    0.0%    45    45  12.4  13.9  11.4  31.9
  8. 4.68.18.195                   0.0%    45    45  33.0  14.5  12.0  46.1
  9. 4.79.219.106                  0.0%    45    45  12.4  13.1  11.6  15.9
 10. 209.128.95.111               26.7%    45    33  18.5  18.1  16.5  24.0
 11. 209.128.121.125              11.1%    45    40  23.1  22.4  20.9  30.7
 12. 72.20.106.125                44.4%    45    25  26.5  27.2  24.8  37.4
=== END

=== Thu Dec 31 22:34:00 PST 2009  (1262327640)
HOST: icarus.home.lan             Loss%   Snt   Rcv  Last   Avg  Best  Wrst
  1. 192.168.1.1                   0.0%    45    45   0.4   0.4   0.3   1.9
  2. ???                          100.0    45     0   0.0   0.0   0.0   0.0
  3. 68.85.191.253                 0.0%    45    45   7.8   8.3   6.6  13.1
  4. 68.85.155.78                  0.0%    45    45  10.5  10.8   8.9  19.5
  5. 68.86.90.137                  0.0%    45    45  13.0  12.9  10.9  33.0
  6. 68.86.86.202                  0.0%    45    45  12.0  14.1  11.7  18.7
  7. 4.71.118.9                    0.0%    45    45  13.0  13.6  11.8  17.0
  8. 4.68.18.195                   0.0%    45    45  12.4  14.8  11.6  34.4
  9. 4.79.219.106                  0.0%    45    45  14.0  14.2  11.9  18.5
 10. 209.128.95.111                0.0%    45    45  13.1  14.9  12.2  20.1
 11. 209.128.121.125               0.0%    45    45  14.6  15.1  12.5  22.4
 12. 72.20.106.125                 0.0%    45    45  15.2  15.6  12.2  22.4
=== END

I dug around on the getmail FAQ and found this entry (I like how it’s labelled “notabug”) which indicates that massive memory allocation is a bug in python 2.4’s IMAP client library, and to upgrade to python 2.5 or 2.6. Well, I hate to break it to you, but…

$ python --version
Python 2.6.2

At this point I’ll be moving back to fetchmail. Why did I switch from fetchmail to getmail in the first place? Because fetchmail appears to have a bug where, on very rare occasion, it delivers a single mail to the MTA twice, despite there only being a single instance of the mail on the IMAP server. My MTA (postfix) logs show this, as do Received: delivery headers.

I can deal with very rare double-delivery of a single mail, but I cannot deal with a piece of software which will eventually completely allocate all memory (real and swap) on the machine when there’s upstream network problems. IMAP is TCP, so there’s absolutely no reason for this behaviour. It’s obviously a bug — I don’t care if it’s in getmail or python — and it’s one administrators should be made aware of.