Archive for February, 2007

How to store data #2: eSATA RAID

This is a followup to a previous post, in which I laid out the principles of how storage should be managed. This post will be the first of two which goes into detail about hardware, software, and how to set everything up.

If you’re not already familiar with RAID, go read the Wikipedia entry, although this post will address the subject as it applies specifically to its task.

This post will focus on how to store REDCODE-compressed 4K footage or other types of media with moderate data rates. I haven’t done sufficient testing on such commodity storage systems to say whether it’s possible to scale performance up enough to handle uncompressed 4K. The 28 MB/s of REDCODE RAW 4K is not a particularly high data rate for modern desktop drives; even a single drive can handle it without much trouble. That means the recommendations in this post will focus on achieving lost cost, high reliability, and simple management, not the highest possible data transfer rates.

As I mentioned in my last post on this subject, storage should be centrally managed. This means building a single storage array, or at least one array per project, rather than e.g. letting external FireWire drives proliferate around an office.

The options for this used to be quite expensive, but fortunately, a technology has come along in recent years which makes things much cheaper and easier: eSATA. And, specifically, eSATA + port multiplication.

Port multiplication, for our purposes, is useful because it lets an external enclosure contain multiple drives, which show up as separate drives to the host computer. While this was technically possible with some SCSI variants, it was expensive, and while there were FireWire enclosures that did this, the more limited bandwidth of FireWire made it somewhat unappealing. eSATA II supports transfer rates of over 300 MB/s, more than enough that with a typical drive array, the interface is not likely to become a bottleneck.

How much storage are we shooting for here? Well, we’re trying to keep this fairly cheap, so let’s start by shooting for enough storage for our first feature, which will be 110 minutes, and will be shot at a 7:1 shooting ratio. That’s 770 minutes of footage, at ~1680 MB/minute, or 1263 GB. We’ll also need space for editing proxies, to provide a bit of a safety margin, and because drive vendors cheat by using decimal rather than binary gigabytes… let’s call the whole array 2 GB. We’d also probably prefer to have some hardware redundancy, so we’ll factor that in as well.

The sweet spot for hard drives right now is 500 GB drives. 750 GB drives, which are 50% larger, cost more than twice as much. We’re going to need five of these drives. A reasonable price for a 500 GB hard drive right now (this will, of course, change next month) is $140, so our raw cost just for the drives is $700.

If we’re building a 2 TB array out of 500 GB drives, why do we want five of them, rather than four? Well, if you create a striped RAID across four drives, and any one of them fails, your data is toast. Even if you leave them as separate volumes, if any one of them fails, 1/4 of your data is toast. And, of course, you’re four times as likely to have a failure with four drives as with one. Sure, hopefully you’ve got everything backed up, but you could still lose a fair bit of work.

Fortunately, there is a solution to this problem, and it’s pretty cheap. That solution is distributed parity, which is a scheme that spreads parity information around all the drives in an array. This parity information can be used to recover all data if any one drive in the array is lost. This information, in a typical configuration, takes up an amount of space equivalent to the capacity of a single drive. So, in this case, we build a five drive array, and have the capacity of a four drive array, but if any one of our our drives fails, we’re fine.

The second part of this post, tying everything together, will be up tomorrow.

Uptdate: Will be up on Saturday. Sorry, business is ramping up and I’ve been insanely busy the last couple of days.

Evolution of Digital Media Creation Technologies

The post on getting (almost) enterprise-class storage for cheap that I promised a couple of days ago isn’t quite ready yet, so here’s a post on the adoption cycle for digital media creation technologies. It’s good background for much of the discussion on this blog, which is, after all, largely about how commodity technology is bringing down the cost of high-quality production and post-production.

There are basically three stages in the adoption of digital technology for creating media content:

  1. A given task is only possible using high-cost specialty products.

  2. A given task is possible using a low-cost commodity approach, but high-cost specialty products offer significant advantages.

  3. Computing power, storage capacity, etc. reach the point where commodity hardware is good enough that there’s no longer a major advantage to using high-end specialty products.

Something like 2K/4K color grading is presently at the first stage, inching toward the second stage… but the insanely fast rate of progress with commodity graphics processing hardware, combined with compressed workflow solutions, could push this into the third stage within a few years.

Non-linear editing is at the second stage. Final Cut Pro exists, but Avid’s higher end stuff still sells. Fast commodity graphics processors are almost certainly going to push NLE fully into the third stage over the next few years. If Avid is smart, they’re gearing up to deal with a market where they’ll have to push more units, at much lower prices. Apple is already there.

What’s really interesting is to look slightly farther afield, to an industry where this cycle has been completed for a substantial period of time. Desktop publishing provides such a glimpse of the future.

In the desktop publishing world, all the arguments you still see today in the filmmaking world (about how commodity technology will never be suitable, how lowering barriers to entry will flood the market with inferior product, etc.) all died years ago, because a large fraction of the current generation of DTP folks got their start after high-priced specialty systems were already dead — many probably aren’t aware they ever even existed. Beginners, even amateurs, have access to the same tools the pros use. There is no real barrier to entering the industry except for talent.

Filmmaking will never quite reach this stage, because some aspects of it are just inherently more expensive… but many of the “old guard” are going to be surprised by just how close it can come.

How to store data #1: background

This post is a follow-up to a post from a couple of weeks ago, “How not to store data“, in which I admonished against the all-too-common practice at small production facilities of stashing data on lots of external hard drives with no coherent management plan.

The single most important characteristic of a storage system is reliability. And the most important thing to realize about reliability is that it’s not something achieved entirely through technological means; it’s something that emerges from a combination of technology and well-planned, properly followed procedures. That’s why this post isn’t a list of products to buy (though that comes next).

Drives fail, backup media goes bad, things get accidentally deleted. In order to store data reliably, you have to have redundancy. And in order to have confidence in the reliability of your storage, you have to know you have redundancy. This means on multi-person projects, you can’t just let everyone handle their own data in whatever way they want.

Why not just stick to the strategy of letting lots of external drives float around, but make a rule that everyone has to make sure there are at least two copies of everything? In my experience, such rules often go unheeded, and at any rate, if multiple people are interacting with the same data, everyone will assume someone else has taken care of that, when they haven’t, leaving your data unprotected — or assume they haven’t, when they have, leaving you with extra unnecessary copies of things.

The best strategy for small production facilities — including everything down to one-man shops — is to scale down the sort of approach used in well-structured enterprise environments, not to try to scale up the “stash it on the external drive” approach that works so well with your 13 year-old cousin’s iMovie projects. Fortunately, it is now possible to do this at very little additional cost.

The key to this approach is centralization — both of the physical hardware and of responsibility. The former means building a storage array, rather than just buying individual drives. The latter means having a single person who’s responsible for the care and feeding of all the organization’s data.

Coming up shortly: what to buy, how to set it up and how to use it.

ZFS in OS X Leopard

Having heard several completely mangled attempts to explain the benefits of ZFS (reliably believed to be a feature in the next version of OS X) in various places (including on This Week in Media a few weeks back), I feel I should probably take a crack at explaining what it is and, more to the point, why it matters to indie filmmakers. Because it does matter, a lot.

This post is background for an upcoming post on building a cheap RAID to hold all those hours of 4K footage you’ll be shooting in a few months.

There’s tons of information about ZFS around the web, but much of it is fairly technical, and as far as I’ve seen there’s almost nothing that explains in concrete terms what makes it different from what we’ve got now, and why it’s so important to IT production workflow.

Basically, ZFS is a much more flexible way of handling data storage than what traditional file systems provide. Traditional file systems work with discrete volumes, which may span one disk, or, with RAID, more than one disk.

With ZFS, instead of having inflexible volumes, you create “storage pools” across multiple physical disks. Pools are flexible; if you create a pool spanning four drives and need to add a fifth drive, you can do that without having to recreate any of the file systems in the pool. You can even remove a drive from a pool, assuming there’s enough space to store all the data without it. With a simple command, ZFS will rearrange your data so the drive you specify is no longer necessary, after which it can be removed.

Drives in the pool can be used for any combination of striping, mirroring, or RAID Z (which is sort of like RAID 5), plus there’s support for hot spares.

With traditional RAID setups, though, if you want to add drives or rearrange things, the whole array has to be backed up and reformatted, or you have to use pricey volume management tools which could take many hours to move your data around, during which time your storage is unavailable. With ZFS, none of this is necessary.

Once a pool is created, file systems can be created and rearranged within that pool extremely easily; it’s nearly as easy as creating or rearranging directories presently is. As many file systems as you’d like to create can share a storage pool.

ZFS also has built-in pervasive checksumming features so it’ll automatically detect if your data gets corrupted (and recover it, if you’ve set things up with some amount of redundancy). And because of its architecture, the data on disk is always in a consistent state, eliminating the need for file system repair utilities and the speed hit associated with journaling.

Right now your options for using ZFS are buying ludicrously expensive Sun gear, or downloading OpenSolaris x86 and trying to piece together a system that will work on your own. (I spent some time trying to figure out what hardware would work for an OpenSolaris-based storage server… it’s not easy to find good information.)

Having support in OS X will make things a lot easier. Up next: discussion of how you can leverage ZFS in OS X to get most of the benefits of enterprise-class storage at a fraction of the price.

Putting everything on the line. Or not.

We’ve all heard stories about people who have sold their houses or run up $30,000 in credit card debt to make their movies. And people in some quarters have accused a lot of Red fans of this sort of thing… of buying equipment they can’t afford with little idea of how they’ll ever pay it off. While perhaps some people are doing this, it seems pretty clear to me that there’s a large difference between blowing your life savings making a single movie, and buying a Red package.

With film, if you really want to make a movie that nobody will fund, you go into debt, sell your stuff, etc. and spend the money on camera rental, film stock, telecine, opticals, whatever… and at the end (assuming your money actually holds out, which it often doesn’t), you have a conformed negative, and basically nothing else. If someone comes along and buys the movie, great. If not, you’re screwed. If you really went out on a limb to make that movie, you’re going to spend years recovering financially.

With Red and other low-cost digital options, for less than the price of paying for all of the above for a single movie, you can buy a camera package and an editing system outright, and the only completely unavoidable per-feature cost becomes a couple thousand bucks worth of hard drives. If nobody buys your first feature, make another. And another. And another. You own equipment which you can rent out during your downtime, and which can be used for whatever shooting or editing jobs you can find. And if you ever really decide to call it quits, you can probably sell your equipment for a decent fraction of what you paid for it.

While buying a Red package might still impose significant financial hardship on some people, when compared with throwing your entire net worth into making a 35mm film, with Red you’re taking less risk and you have a much better chance of getting a return on your investment. There’s a whole business model you can build around a Red package, beyond the “make feature film; try to sell” routine.