Massive mail import in GMail
Monday, August 25th, 2008Recently I had to archive a consequent number of mails into GMail. I don’t want to keep those mails in my Mail User Agent (Mail.app), yet I want to be able to search and retrieve any single mail. Of course I’m keeping an mbox version of those mails but once it’s in an mbox, it’s kind of dead because you cannot easily read one mail or search the mbox.
I considered several solutions and the simplest I could find is to dedicate one GMail account for that purpose. The only drawback is to have internet access but I’m fine with that. In addition the IMAP support provided by GMail allows to do that without any hacks.
Once my special GMail account was created, I started to use my mail client to copy the messages in dedicated folders in the archive account. It was working fine for a small number of messages like 5000. Some of my mailboxes have way more messages, like 60,000. Above 5000 messsages, it becomes a nightmare for two reasons:
- The mail client synchronizes with the archive account which slows down the whole process.
- Network failures are more likely to happen and make the whole process very fragile. When you have a network failure you don’t know how your mail client will hold the resume.
- JavaMail: the corner stone
- JSR 166 Concurrency Utilities, the CompletionService which allows to multithread trivially the system and improve the import throughput
- Mstor a local store provider for JavaMail: provides a way to read mbox format using the JavaMail API.
