Folks: I’d like to create a file system archive of an email account. Suppose that I have 100 messages in an email account (for concreteness sake, let’s assume they are in gmail). I can configure the email account as a POP or IMAP server. From a Windows, Mac, or Linux machine, I would like to fetch all 100 of the email messages and write each one out to a file. I would like it if the file were sensibly named, but that isn’t critical. How to do this?
I think that I’ve tried the obvious solution, which is to pull the messages down into Microsoft Outlook and then “export” a folder. The result is a .csv file containing all the messages from the folder.
A friend suggested using Thunderbird and the following extension: http://www.nic-nac-project.de/~kaosmos/mboximport-en.html
Better ideas?
getmail: http://pyropus.ca/software/getmail/
I’ve previously used offlineimap http://software.complete.org/offlineimap for this task. It synchronizes an IMAP account with a local folder (in Maildir format, i.e. one file per message). You can run it just once, or regularly to have a running backup.
I use a not too complex method for this on Linux.
1. Configure mail system to use “Maildir” Format. This is one file per message.
2. “Fetchmail” to retrieve mail from pop/imap.
Fetchmail can be configured to deal with multiple mail sources with various setup scripts. I use fetchmail because I want manual control over when it polls for mail rather than have the automatic fetching of email.
The filenames are randomly chosen gibberish.
Just a vague hunch that “fetchmail” ought to be able to do that.
Mail.app on the Mac stores them this way by default (so spotlight can index them). If you’re already using it, just look in ~/Library/Mail/MAILBOXNAME.
Otherwise just setup the IMAP (or POP) account in mail app. It will download all the messages in the account and store them in ~/Library/Mail/ in their IMAP folder structure. The filenames for each individual mail are just just integers though, but the files are basically plaintext (there’s a parallel attachments directory which contains all the decoded attachments with the same index ids as the messages).
Looks like ~/Library/Mail/Envelope Index probably has indexing info, and it’s just a SQL lite database, so you might be able to easily script out something to rename the files to a subject line or something if you -had- to have better filenames. (interesting link: http://www.javarants.com/2008/12/26/build-your-own-mail-analyzer-for-mac-mailapp/#more-943)
Funny I was just working with ruby to do something similar… after a couple small modifications, the following snippet should do exactly what you need.
http://pastie.org/871368
Oops. Apparently, angle brackets are not allowed. Here’s what I meant to write:
Matt Cutts, a long-time Google employee, wrote this article on the topic:
http://www.mattcutts.com/blog/backup-gmail-in-linux-with-getmail/
I haven’t tried it yet, but I intend to.
A number of years ago I wrote a Python script to back up my Gmail into folders because I wanted to backup my email and also make it easy to archive (ie, not have to use a specific email client), I went ahead and uploaded the script to Github since I thought you might find it useful:
http://github.com/Jaymon/Popbak
You can do this so easily in Thunderbird without any special extensions. Pull all the emails down, then drag ’em into a new folder, right-click on that folder, and check out the export options. As I recall, you can export as txt files, eml files, html files (with html index), a big csv file… Anyway, you have several options right out of the box for exporting as individual files.
Mail Trends might be useful — in addition to being optimized for Gmail, it supports interesting visualizations:
http://code.google.com/p/mail-trends/
It seems to me a case where a 5 line ruby script is far easier than any app, from the cookbook:
require ‘net/pop’
conn = Net::POP3.new(‘mail.myhost.com’)
conn.start(‘username’, ‘password’)
conn.mails.each {|msg| File.open(msg.uidl, ‘w’) { |f| f.write msg.pop }}
conn.finish
A Common Lisp solution: http://darcs2.informatimago.com/bin/fetch-pop
( Would have to be modified to split the mailbox into one file per message, but it should be trivial for someone like you 😉 ).
Otherwise this can also be done with fetchmail -m.
I’ve used RJH’s method using getmail (which has one of the world’s most attentive developers working on it for way too long) & maildrop instead of fetchmail + procmail – which is a more popular combination, and used Mutt to read it. I think with maildrop you can write a few regexes to perhaps take the subject line and use that as a filename.
Other than the naming thing, any combo of the two sets is very easy on Unix.
There’s already a program that does it for GMail. Google “back up gmail” and you’ll find it. It creates a single file for each message in specific folder.
Peter seconded. getmail + maildir.
TCLLIB has POP procs: http://tcllib.sourceforge.net/doc/pop3.html
I’ve used getmail into maildir format for years w/o any problems. Great setup.
there is also message save for outlook. I used this a few years
ago on outlook and it was pretty easy to set up. Its not free tho.
http://www.techhit.com/messagesave/
Gee, I do this all the time with good ol’ Outlook Express 6. Within OE6, just highlight all the messages you want to save, then click-and-drag them to a folder. Each message is saved as a separate file, with the subject as a filename. It intelligently adds (n) where n is an integer to prevent same-name conflicts. Each file has an .eml suffix but is basically a text file. It uses the “received” time for the timestamp of the file. Couldn’t be simpler. I believe this works in OE6’s successors (Windows Mail and Windows Live Mail), but I haven’t thoroughly tested this.
There is a python library to gmail.