This is a followup to a previous post, in which I laid out the principles of how storage should be managed. This post will be the first of two which goes into detail about hardware, software, and how to set everything up.
If you’re not already familiar with RAID, go read the Wikipedia entry, although this post will address the subject as it applies specifically to its task.
This post will focus on how to store REDCODE-compressed 4K footage or other types of media with moderate data rates. I haven’t done sufficient testing on such commodity storage systems to say whether it’s possible to scale performance up enough to handle uncompressed 4K. The 28 MB/s of REDCODE RAW 4K is not a particularly high data rate for modern desktop drives; even a single drive can handle it without much trouble. That means the recommendations in this post will focus on achieving lost cost, high reliability, and simple management, not the highest possible data transfer rates.
As I mentioned in my last post on this subject, storage should be centrally managed. This means building a single storage array, or at least one array per project, rather than e.g. letting external FireWire drives proliferate around an office.
The options for this used to be quite expensive, but fortunately, a technology has come along in recent years which makes things much cheaper and easier: eSATA. And, specifically, eSATA + port multiplication.
Port multiplication, for our purposes, is useful because it lets an external enclosure contain multiple drives, which show up as separate drives to the host computer. While this was technically possible with some SCSI variants, it was expensive, and while there were FireWire enclosures that did this, the more limited bandwidth of FireWire made it somewhat unappealing. eSATA II supports transfer rates of over 300 MB/s, more than enough that with a typical drive array, the interface is not likely to become a bottleneck.
How much storage are we shooting for here? Well, we’re trying to keep this fairly cheap, so let’s start by shooting for enough storage for our first feature, which will be 110 minutes, and will be shot at a 7:1 shooting ratio. That’s 770 minutes of footage, at ~1680 MB/minute, or 1263 GB. We’ll also need space for editing proxies, to provide a bit of a safety margin, and because drive vendors cheat by using decimal rather than binary gigabytes… let’s call the whole array 2 GB. We’d also probably prefer to have some hardware redundancy, so we’ll factor that in as well.
The sweet spot for hard drives right now is 500 GB drives. 750 GB drives, which are 50% larger, cost more than twice as much. We’re going to need five of these drives. A reasonable price for a 500 GB hard drive right now (this will, of course, change next month) is $140, so our raw cost just for the drives is $700.
If we’re building a 2 TB array out of 500 GB drives, why do we want five of them, rather than four? Well, if you create a striped RAID across four drives, and any one of them fails, your data is toast. Even if you leave them as separate volumes, if any one of them fails, 1/4 of your data is toast. And, of course, you’re four times as likely to have a failure with four drives as with one. Sure, hopefully you’ve got everything backed up, but you could still lose a fair bit of work.
Fortunately, there is a solution to this problem, and it’s pretty cheap. That solution is distributed parity, which is a scheme that spreads parity information around all the drives in an array. This parity information can be used to recover all data if any one drive in the array is lost. This information, in a typical configuration, takes up an amount of space equivalent to the capacity of a single drive. So, in this case, we build a five drive array, and have the capacity of a four drive array, but if any one of our our drives fails, we’re fine.
The second part of this post, tying everything together, will be up tomorrow.
Uptdate: Will be up on Saturday. Sorry, business is ramping up and I’ve been insanely busy the last couple of days.