Rethink the backup story
Open, NormalPublic

Description

We should possibly rethink how we do backups. As of right now, the current situation is suboptimal in that we the duplicity caches take up insane amounts of disk space, and I'm not even sure if we're encrypting backups at all. Furthermore the retention policy is probably suboptimal (we'd like tiers of backups, like monthly/weekly/daily/hourly), and new servers on Rackspace are not using backup yet (see T11).

In general, once T12 is done, I think we should seriously consider allocating some of our money to something else even. We can roll our own solution (e.g. a FreeBSD Rackspace server w/ ZFS RAID10), but that'll cost us some amount of our free tier with Rackspace, and we'd like to keep that if possible (for GHC builders, and for new services.) Ideally we could actually cut costs using a specialized service.

So I'll make a proposal: providing we save money and move everything to rackspace (post T12), I'd highly advocate investigating Tarsnap as a robust and reliable solution - it's high security (true 'backups for the paranoid', which I think we should be), well designed, very robust, and has saved my ass numerous times while costing basically pennies on the dollar thanks to compression/dedup. Furthermore, it dramatically simplifies management - Tarsnap needs nothing more than some basic cron jobs, and for us to backup some very small private key files (easily printable, even). Me and @johnw could store key material for example (since the key allows you to extract backups from anywhere), encrypt it to a secret password for safety (in case of compromise) using the scrypt encryption utility, and back it up somewhere else.

In general it's hard to estimate exactly how much this would cost though, because we have to factor in deduplication. However, roughly tarballing and compressing the current backups should give a good estimate, and the overall cost is amortized due to deduplication, and the fact we can just delete stale backups over time to keep the rates essentially flat.

I also imagine once we rummage around the servers, we'll figure out exactly which data is *really* critical and needs constant backups, what rates they should be at, and we'll be able to eliminate a lot of cruft and old junk (e.g. there's no reason to back up etc when most of that can be stored in Git anyway, and very high churn things like the Hackage server will actually be much more manageable once we split out the builder process that doesn't need backups, but is essentially ephemeral - see T5).

austin created this task.Jun 4 2014, 1:09 AM
austin updated the task description. (Show Details)
austin raised the priority of this task from to High.
austin added a subscriber: Haskell.org Infrastructure.
austin added a subscriber: austin.Jun 4 2014, 1:13 AM

Oh, and another alternative, of course, is Rackspace's built in backup support, which is fairly cheap IIRC (their standard pricing for Cloud files, which is where the backups are stored). But I just have more experience with Tarsnap in general and like it a lot.

austin edited this Maniphest Task.Jun 4 2014, 1:25 AM
austin claimed this task.Jul 15 2014, 10:56 PM
austin lowered the priority of this task from High to Normal.Jul 15 2014, 11:51 PM

For the moment, I've mitigated the serverity of this problem by enabling Rackspace backups for the main servers we're hosting. As we create new ones, we'll make backups too.

Backups now run daily for darcs.haskell.org (re: T11), planet.haskell.org, phabricator.haskell.org, and mysql01.haskell.org.

hvr added a subscriber: hvr.Jul 16 2014, 1:54 AM

just wondering as I've lost track/forgotten: what's the current backup status for ghc.haskell.org? shall we be worried?

@hvr We should keep it backed up. More incentive to move to Rackspace if we get free backups for now! (See also T12.)

austin changed the visibility from "All Users" to "Public (No Login Required)".Oct 25 2014, 3:17 AM
austin changed the edit policy from "All Users" to "Haskell.org Infrastructure (Project)".