We should possibly rethink how we do backups. As of right now, the current situation is suboptimal in that we the duplicity caches take up insane amounts of disk space, and I'm not even sure if we're encrypting backups at all. Furthermore the retention policy is probably suboptimal (we'd like tiers of backups, like monthly/weekly/daily/hourly), and new servers on Rackspace are not using backup yet (see T11).
In general, once T12 is done, I think we should seriously consider allocating some of our money to something else even. We can roll our own solution (e.g. a FreeBSD Rackspace server w/ ZFS RAID10), but that'll cost us some amount of our free tier with Rackspace, and we'd like to keep that if possible (for GHC builders, and for new services.) Ideally we could actually cut costs using a specialized service.
So I'll make a proposal: providing we save money and move everything to rackspace (post T12), I'd highly advocate investigating Tarsnap as a robust and reliable solution - it's high security (true 'backups for the paranoid', which I think we should be), well designed, very robust, and has saved my ass numerous times while costing basically pennies on the dollar thanks to compression/dedup. Furthermore, it dramatically simplifies management - Tarsnap needs nothing more than some basic cron jobs, and for us to backup some very small private key files (easily printable, even). Me and @johnw could store key material for example (since the key allows you to extract backups from anywhere), encrypt it to a secret password for safety (in case of compromise) using the scrypt encryption utility, and back it up somewhere else.
In general it's hard to estimate exactly how much this would cost though, because we have to factor in deduplication. However, roughly tarballing and compressing the current backups should give a good estimate, and the overall cost is amortized due to deduplication, and the fact we can just delete stale backups over time to keep the rates essentially flat.
I also imagine once we rummage around the servers, we'll figure out exactly which data is *really* critical and needs constant backups, what rates they should be at, and we'll be able to eliminate a lot of cruft and old junk (e.g. there's no reason to back up etc when most of that can be stored in Git anyway, and very high churn things like the Hackage server will actually be much more manageable once we split out the builder process that doesn't need backups, but is essentially ephemeral - see T5).