Fetchmail and large emails

nirav April 29th, 2006

When came to office on Saturday, I wanted to spend some time on OpenCerti and complete the pending ActionScript work. But there was a problem waiting! The emails were stuck, and we were getting duplicate emails.

Let me give you some background. We have a set of Linux servers in the office. The magnet-i.com site is hosted in a NOC in USA. We have a few POP accounts on the server and a catchall. The catchall captures emails for most of the users. POP accounts are primarily used by people who travel and need roaming access etc. We download emails to local server using Fetchmail. The internet connection is via cable modem and DSL.

Now this kind of problems have happened before. I remember in the dialup days, if we had a large number of emails, and if the internet connection drops while fetchmail is running, it would start downloading all the emails again the next time you connect. Resulting in duplicate emails.

This time around, we have a major recruitment drive going on. The catchall account was more than 85MB in size. And fetchmail was giving up on it. It was not only the size, but also the number of messages that was big. And the mails were not only for the HR, but also for other people in the organization.

I and Vishal looked at the problem and first tried to delete unwanted mails via webmail. We got it down to 55MB, but this was still too big for fetchmail to handle on our internet connection.

The next step we generally take is to take the mbox file, bzip2 it, download it over web, append to the local mbox, and let the mails flood into the email client. This time, we couldn’t do it because it was not only one user account that the emails were destined too. But this is the track that we take up.

The mbox file contains all the mails in a single file. So we SSH’ed to the server, located the catchall account (something like /home/user/mail/domainname/emailaccount/ on cpanel servers) mailbox. Doing a bzip2 on it, got the file size down to 11MB. (OT: I wonder why they didn’t add compression in POP/SMTP. That would have saved a lot of traffic).

Downloading this via web (so that we can resume the download if it gets broken) did not work. Somehow our server did not allow direct file downloads. (Guess it was me only who disabled hot links like this..) We didn’t have too much time to go reconfigure the server, so we simply FTP’ed the file to one of our servers that we use for exchanging files with clients. Downloaded the 11MB file to local machine and posted it on the mail server.

Now what?

Idea! What if we configure fetchmail to connect to the local POP server? We could create a new user and push the downloaded mails into the new user’s mbox file. When fetchmail connects to the local server to fetch emails, it will fetch emails from this new account. And it can deliver them to the local users as per the original configuration!

The idea was right, and we tested it with one or two messages in the mbox file. First it bounced back, saying there’s a “mail forwarding loop”. We removed the “no dns, aka magnet-i.com” part from the fetchmail config, and it started pushing the emails to the postmaster. At least it did not bounce back! A few trials later - and inspecting the logs and the postmaster emails - we figured it out. We need to have the “aka magnet-i.com” line in, and have the mbox file of the server.

So set this up, ran fetchmail, and it went chopping the mbox like crazy and delivering emails to the local users. If we used some other method (like downloading only new messages with UIDL etc), it would have taken 5 times more time to download the emails, and we would have to monitor the process.

This gets us the best practices for handling large emails that are stuck on the server.

  • Bzip the mbox file. Download it via web.
  • Unzip the mbox file on local server. And process it there.
  • If it’s a single email account, simply append the mbox file to local mbox file. And let the user dowload emails via her email client.
  • If there are multiple email accounts in the mbox file (mbox of a catchall), create a new account on local server and append the mbox to it. Add a rule in .fetchmailrc to use local server as POP3 and fetch emails for the new account created. And then distribute it to “* here”
  • You can use “fetchmail -v” for verbose output.
  • Monitor /var/log/maillog and /var/log/fetchmail.log for info. Also check the postmaster account (we run Postfix) for error reports.

Good troubleshooting for the day! I am off to something bigger now!

Comments RSS

Leave a Reply